# Install packages
install.packages("broom")
install.packages("ggplot2")
install.packages("glmnet")
install.packages("gt")
install.packages("nhanesA")
install.packages("randomForest")
install.packages("scales")
install.packages("survey")
install.packages("tidyverse")
install.packages("tidymodels")
install.packages("xgboost")
install.packages("vip")
# Install dietaryindex packages
devtools::install_github("zxucmu/dietaryindexNDP", ref = "dietaryindexNDP_1.0.0")Dietary Determinants of Allergic, Immunologic, and Cardiometabolic Diseases
BMIN5030/EPID600 Final Project
1 Overview
This project used data from the nationally representative National Health and Nutrition Examination Survey (NHANES) to explore links between dietary determinants and allergic, immunologic, and cardiometabolic outcomes in U.S. adults, focusing on those between the ages of 20 and 40. Using survey-weighted univariable and multivariable regression analyses, I examined whether diet quality (primarily assessed via the Dietary Inflammatory Index) and various sociodemographic (e.g., age, sex, race/ethnicity, poverty-income ratio), behavioral (e.g., alcohol, smoking), and clinical (e.g., body mass index [BMI], blood pressure, hemoglobin A1c [HbA1c], vitamin D level) features were associated with self-reported asthma, allergic rhinitis, inflammatory arthritis, eczema, food allergy, diabetes, and hypertension. In addition, I used exploratory machine learning models to identify predictors that may contribute to risk across these diseases.
I discussed this project with Dr. Robert Grundmeier (Professor of Pediatrics at CHOP and Penn, Department of Biomedical and Health Informatics) and Dr. Rich Tsui (Professor of Anesthesiology and Critical Care at CHOP and Penn, Department of Anesthesiology and Critical Care Medicine). With Dr. Grundmeier, I discussed cohort definitions, application of survey weights, and approach for regression analyses. With both Dr. Grundmeier and Dr. Tsui, I discussed the approach for machine learning analyses.
GitHub repository link: Final Project Repo
2 Introduction
Over the past several decades, rates of allergic conditions – including asthma, allergic rhinitis, food allergy, and eczema - have risen substantially worldwide, driven by mechanisms that remain only partially understood [1]. A concurrent increase has occurred in cardiometabolic diseases, such as obesity, hypertension, and diabetes [2]. Allergic, immunologic, and cardiometabolic conditions share comorbidity and, together, reflect immune dysregulation that occurs due to a complex interplay among genetic susceptibility, environmental exposures, and immune-metabolic pathways. There is growing recognition that dietary factors influence immune development beginning early in life and continuing into adulthood. Diet quality, nutrient intake patterns, and anthropometric factors such as adiposity influence inflammation, epithelial barrier function, and the gut microbiome, which in turn affect disease susceptibility [3,4]. While a modulatory role for diet has been been implicated in select allergic (e.g., asthma) and cardiometabolic (e.g., obesity) conditions [5,6], knowledge gaps remain about the potential modulatory role in other allergic and immunologic diseases. This knowledge could help guide how dietary approaches might be integrated into clinical care to prevent or manage these comorbidities.
NHANES is a continuous, nationally representative survey that integrates interviews, physical examinations, and laboratory assessments to characterize the health and nutritional status of the U.S. population. It provides detailed information on demographics, health behaviors, dietary recalls, biomarkers, anthropometrics, and clinical outcomes, making it well suited for population-level nutrition research [7]. The Dietary Inflammatory Index (DII) is an established, literature-derived score that quantifies the inflammatory potential of an individual’s diet based on intake of nutrients and food components known to influence systemic inflammation. To compute standardized DII values in NHANES, this study used the dietaryindexNDP package, which links NHANES dietary recall data with nutrient composition from the U.S. Department of Agriculture’s Food and Nutrient Database for Dietary Studies (FNDDS) and food group equivalents from the Food Patterns Equivalents Database (FPED) [8,9]. Leveraging these tools, I examined associations between dietary, sociodemographic, and clinical factors and a set of allergic-immunologic outcomes (asthma, allergic rhinitis, inflammatory arthritis, eczema, and food allergy) as well as cardiometabolic outcomes (diabetes and hypertension), focusing on adults aged 20-40 years. This age group represents a population that may be particularly amenable to dietary interventions aimed at disease prevention or management. In this study, I applied survey-weighted logistic regression along with exploratory machine learning models to assess whether dietary factors and related covariates are associated with risk across these various conditions that are broadly characterized by immune dysregulation.
This work is inherently multidisciplinary, bridging allergy–immunology, nutrition, epidemiology, and biomedical informatics. Informatics approaches were essential to this project, from extracting and integrating NHANES data to applying the survey design structure and implementing the regression and machine learning analyses. The integration of my training in allergy-immunology, clinical nutrition, epidemiology, statistics, and informatics uniquely equipped me to complete this project and interpret its findings from a cross-disciplinary perspective. This project was supported by mentorship from Drs. Robert Grundmeier and Rich Tsui, whose guidance was instrumental in developing cohort definitions, applying survey-weighted design, and shaping the analytic approach for both the regression and machine learning analyses.
3 Methods
Data source, cohort extraction, and data harmonization
Using NHANES, I performed a cross-sectional analysis focused on non-pregnant adults ages 20-40 years from 2005 to 2012. This age and time range were selected to target a younger adult population that may be more amenable to dietary interventions for prevention or management of chronic diseases, to avoid the increasing medical complexity associated with older age, and to mitigate missingness of key variables that were either relatively more missing or entirely absent in other NHANES cycles. I pooled four continuous two-year NHANES cycles (2005-2006, 2007-2008, 2009-2010, and 2011-2012). For the main analyses (asthma, allergic rhinitis, inflammatory arthritis, diabetes, hypertension), data from the full pooled period (2005–-2012) were used. Because eczema and food allergy were only measured in select cycles, I additionally constructed time-restricted subcohorts for eczema (2005-2006) and food allergy (2007-2010).
Covariates included sociodemographic variables (e.g., age, sex, race/ethnicity), clinical characteristics (e.g., BMI, waist-to-height ratio, vitamin D [25(OH)D] level), and self-reported allergic-immunologic conditions (asthma, allergic rhinitis, inflammatory arthritis [i.e., rheumatoid or psoriatic arthritis], eczema, food allergy) and cardiometabolic diseases (diabetes, hypertension). Two-year survey weights were rescaled to generate weights for analysis of pooled NHANES cycles. Data from multiple NHANES modules were merged into an analytic dataset, and variables were harmonized by recoding them into consistent categories across cycles. Using the dietaryindexNDP package, the Dietary Inflammatory Index (DII) was computed based on 24-hour dietary recall data and was represented as both as a continuous score (with and without alcohol included) and as quartiles (Q1–Q4), ranging from most anti-inflammatory to most pro-inflammatory, as previously described [10]. To reduce the influence of extreme values and to create clinically meaningful comparison groups, select variables (e.g., BMI, vitamin D levels) were converted into quartiles before modeling. I performed data manipulation with the tidyverse package collection, formatted outputs with broom and scales packages, generated figures using the ggplot2 package, and generated tables using the gt package.
All extracted variables, their types (i.e., continuous, categorical, binary), distributions, and missingness are summarized in Table 1. In general, for sociodemographic and behavioral variables, missingness was low, with most variables (e.g., age, sex, education, insurance, smoking, physical activity, family history) showing a missingness of less than 2%, while other variables (e.g. race/ethnicity, alcohol use, poverty-income ratio, food security) showed a range of missingness between 7% and 20%. For cardiometabolic and nutrient labs, the missingness rate was about 8% to 10%. The rate of missingness for primary clinical outcomes was quite low (less than 1%). For food allergy and eczema, the missingness (calculated relative to the full cohort) was expected given the unavailability of these clinical outcomes in many survey cycles. Hence time-restricted comparator cohorts were used for secondary analyses for these specific clinical outcomes. Subsequent regression and machine learning analyses used complete-case data.
Weighted prevalence analysis
Using the survey package, the survey-weighted prevalence of all allergic, immunologic, and cardiometabolic conditions was determined. In addition, to assess how prevalence varied by dietary inflammatory status, survey-weighted prevalence was determined within DII quartiles (Q1-Q4).
Univariable and multivariable regression analyses
Survey-weighted regression analyses were performed using the svyglm function in the survey package. After assessing models for overdispersion, quasibinomial survey-weighted logistic regression was employed. First, univariable analyses were performed to examine associations between each outcome and a core set of sociodemographic, behavioral, clinical, and dietary predictors. An additional exploratory set of univariable analyses evaluated associations between each outcome and individual DII nutrient components. For all univariable models, odds ratios (ORs) with 95% confidence intervals (CIs) were reported, and p values were adjusted using the Benjamini-Hochberg method to account for multiple comparisons.
Next, survey-weighted multivariable regression analyses were performed to evaluate adjusted associations between continuous DII and each disease outcome. Three models were constructed. Model 1 adjusted for age and sex. Model 2 additionally adjusted for race/ethnicity, insurance status, poverty-income ratio, and education level. Model 3 further adjusted for BMI category, physical activity, and smoking status. For each model, the OR, 95% CIs and p values were determined, and results were visualized using Forest plots.
Effect modification analyses
I assessed whether the association between continuous DII and each disease outcome differed by sex, BMI category, or vitamin D category using survey-weighted logistic regression models that included all Model 3 covariates. Sex contributed one interaction term (DII x sex), whereas the multi-category BMI and vitamin D variables contributed multiple interaction terms with DII. I determined stratum-specific ORs and 95% CIs for DII within each modifier level. Statistical significance of effect modification was evaluated using Wald chi-square tests combining all interaction coefficients for each modifier.
Machine learning analyses
I developed exploratory machine learning models to predict disease outcomes (asthma, allergic rhinitis, inflammatory arthritis, hypertension, diabetes, eczema, and food allergy) using the tidymodels set of packages for preprocessing, data sampling, model tuning, and performance evaluation. Two predictor sets were examined. Set 1 included core sociodemographic, behavioral, anthropometric, and clinical variables as well as continuous DII scores (23 variables in total). Set 2 included individual nutrient predictors (26 variables in total). Four algorithms were trained for each outcome, namely logistic regression (LR), LASSO-penalized logistic regression (glmnet package), Random Forest (RF; randomForest package), and XGBoost (XGB; xgboost package). Logistic regression was chosen as an interpretable baseline model, LASSO was chosen for its ability to shrink coefficients and perform variable selection, and RF and XGB were included to capture complex non-linear relationships.
For each outcome and predictor set, data were split into a training set (80%) and a test set (20%). Within the training set, 10-fold cross-validation was used for preprocessing and hyperparameter tuning. Preprocessing steps were implemented using tidymodels recipes. Hyperparameters for LASSO (penalty), RF (mtry), and XGB (number of trees, tree depth, learning rate) were tuned using the cross-validated area under the receiver operating characteristic curve (AUROC). Model performance on the test set was quantified using AUROC. To identify the topmost predictors, I computed model-specific variable importance scores using the vip package. For LR models, variable importance referred to the absolute value of standardized regression coefficients. For LASSO models, variable importance was based on the absolute value of the penalized coefficients after shrinkage. For RF models, variable importance was derived from tree-based split metrics (i.e., increase in node purity associated with each variable). Finally, for XGB, variable importance reflected the gain, i.e., how much each predictor improved the model’s accuracy when used in the boosted trees.
Code to install and load packages
# Load packages
library(broom)
library(dietaryindexNDP)
library(ggplot2)
library(glmnet)
library(gt)
library(nhanesA)
library(randomForest)
library(scales)
library(survey)
library(tidymodels)
library(tidyverse)
library(xgboost)
library(vip)Code to generate master analytic dataset and summarize variable types, distributions, and missingness
The code below was used to generate the master analytic dataset for subsequent regression and machine learning analyses. Briefly, custom helper functions were used with the nhanesA package to streamline downloading NHANES modules of interest (i.e., demographics, psychosocial variables, medical conditions, anthropometrics, laboratory results, dietary variables, etc.) across four survey cycles (2005–2012). These were helpful as I explored various iterations of surveys cycles and variables to optimize the cohort parameters that I ultimately selected for this project. The set of tidyverse packages was use to clean and merge these data. I then used the dietaryindexNDP package to compute Dietary Inflammatory Index (DII) scores (with and without alcohol) and individual nutrient components. All cleaned modules were joined by participant ID into a single dataset. In this dataset, I harmonized disease outcomes (i.e., asthma, allergic rhinitis, inflammatory arthritis, eczema, food allergy, diabetes, hypertension) and converted selected variables (e.g., BMI, vitamin D level) into clinically meaningful categories. The cohort was restricted to non-pregnant adults 20-40 years of age, and NHANES sampling weights were rescaled to account for pooling of survey cycles. Using the survey package, I then generated outcome-specific analytic subsets with corresponding MEC-weighted survey designs for subsequent regression analyses. Finally, I created a data dictionary-style table (Table 1) summarizing all variables, their types, sample sizes, value ranges, categorical levels, and missingness. Additional coding details are included as annotations within the code chunk below.
# Helper functions-----------------------------------------------------------------------------
## Helper function to load NHANES module across survey cycles, including options to restrict to select variables and provide status summaries
load_nhanes_module <- function(module, cycles_vec = cycles,
keep_vars = NULL,
quiet = TRUE) { # set to FALSE for debugging messages
module_years <- paste0(module, "_", cycles_vec)
status <- tibble(
file_tag = module_years,
ok = FALSE,
n_rows = NA_integer_,
note = NA_character_
)
dfs <- vector("list", length(module_years))
for (i in seq_along(module_years)) {
x <- module_years[i]
out <- tryCatch(
{
if (!quiet) {
message("Attempting to download: ", x)
}
df <- nhanes(x) |> as.data.frame()
if (!is.null(keep_vars)) {
df <- df |> select(any_of(keep_vars))
}
n <- nrow(df)
status$n_rows[i] <- n
if (n == 0) {
status$ok[i] <- FALSE
status$note[i] <- "Downloaded, 0 rows"
if (!quiet) {
warning("File ", x, " downloaded but has 0 rows.")
}
df <- NULL
} else {
status$ok[i] <- TRUE
status$note[i] <- "Loaded"
df$file_tag <- x
}
df
},
error = function(e) {
status$ok[i] <- FALSE
status$note[i] <- paste("ERROR:", conditionMessage(e))
if (!quiet) {
warning("File ", x, " could not be downloaded: ", conditionMessage(e))
}
return(NULL)
}
)
dfs[[i]] <- out
}
dfs_nonempty <- dfs[!vapply(dfs, is.null, logical(1))]
if (!quiet) {
message("\nSummary for module ", module, ":")
print(status)
}
if (length(dfs_nonempty) == 0) {
stop("No cycles successfully downloaded for module: ", module)
}
out <- bind_rows(dfs_nonempty)
### Missing variables filled with NA
if (!is.null(keep_vars)) {
missing_vars <- setdiff(keep_vars, names(out))
if (length(missing_vars) > 0) {
for (v in missing_vars) {
out[[v]] <- NA_real_
}
}
}
out
}
## Helper function to convert yes/no (1/2) variables into 0/1
to01 <- function(x) {
x <- as.character(x)
case_when(
x %in% c("1", "Yes", "YES", "yes") ~ 1,
x %in% c("2", "No", "NO", "no") ~ 0,
TRUE ~ NA_real_
)
}
## Helper function to convert survey cycle variable survey cycle labels
cycle_label <- function(s) {
years <- str_extract(as.character(s), "\\d{4}-\\d{4}")
case_when(
!is.na(years) ~ years,
TRUE ~ recode(
as.numeric(s),
`4` = "2005-2006",
`5` = "2007-2008",
`6` = "2009-2010",
`7` = "2011-2012",
.default = NA_character_
)
)
}
## Helper function to summarize variable availability and missingness by cycle
check_var_availability <- function(data, vars = NULL) {
if (is.null(vars)) {
vars <- setdiff(
names(data),
c(
"seqn", "wtint2yr", "wtmec2yr", "wtint8yr", "wtmec8yr",
"sdmvpsu", "sdmvstra", "sddsrvyr", "cycle_str"
)
)
}
map_dfr(vars, function(v) {
data |>
group_by(cycle = sddsrvyr) |>
summarise(
variable = v,
total = n(),
n_nonmissing = sum(!is.na(.data[[v]])),
n_missing = sum(is.na(.data[[v]])),
prop_missing = mean(is.na(.data[[v]])),
.groups = "drop"
)
}) |>
arrange(variable, cycle)
}
## Restrict analysis NHANES cycles spanning 2005–2012 (i.e., D–G)
cycles <- c("D", "E", "F", "G")
# Load NHANES modules--------------------------------------------------------------------------
## Load demographic variables within DEMO module (i.e., demographics, survey design variables)
demo_keep <- c(
"SEQN", # Participant ID
"RIDAGEYR", # Age
"RIAGENDR", # Sex (1 = Male, 2 = Female)
"RIDRETH1", # Race/ethnicity (5-level)
"DMDEDUC2", # Education level
"INDFMPIR", # Poverty-income ratio
"SDDSRVYR", # Survey cycle indicator
"SDMVPSU", # Primary sampling unit (PSU) for survey design
"SDMVSTRA", # Stratification variable for survey design
"WTINT2YR", # 2-year interview weight
"WTMEC2YR", # 2-year MEC exam weight
"RIDEXPRG" # Pregnancy status for females
)
demo <- load_nhanes_module("DEMO", keep_vars = demo_keep)
## Load medical variables within MCQ module (i.e., asthma, family history of asthma, arthritis)
mcq_keep <- c(
"SEQN",
"MCQ010", # Asthma
"MCQ300B", # Family history asthma
"MCQ160A", # Arthritis yes/no
"MCQ190", # Arthritis type (2005–08)
"MCQ191", # Arthritis type (2009–10)
"MCQ195" # Arthritis type (2011–12)
)
mcq <- load_nhanes_module("MCQ", keep_vars = mcq_keep)
## Load eczema and allergic rhinitis data for 2005–2006 (AGQ module)
agq_d <- load_nhanes_module(
"AGQ",
cycles_vec = "D",
keep_vars = c("SEQN","AGQ010","AGQ180")
)
## Load allergic rhinitis data for 2007–2012 (RDQ module)
rdq <- load_nhanes_module(
"RDQ",
cycles_vec = c("E","F","G"),
keep_vars = c("SEQN","AGQ030")
)
## Load food allergy data (DBQ920) for 2007–2010
dbq <- load_nhanes_module("DBQ", keep_vars = c("SEQN","DBQ920"))
## Load diabetes status (DIQ010)
diq <- load_nhanes_module("DIQ", keep_vars = c("SEQN","DIQ010"))
## Load self-reported hypertension (BPQ020)
bpq <- load_nhanes_module("BPQ", keep_vars = c("SEQN","BPQ020"))
## Load anthropometrics (BMI, waist, height, weight)
bmx <- load_nhanes_module(
"BMX",
keep_vars = c("SEQN","BMXBMI","BMXWAIST","BMXHT","BMXWT")
)
## Load individual systolic and diastolic blood pressure readings
bp <- load_nhanes_module(
"BPX",
keep_vars = c(
"SEQN",
"BPXSY1","BPXSY2","BPXSY3","BPXSY4",
"BPXDI1","BPXDI2","BPXDI3","BPXDI4"
)
)
## Load serum vitamin D [25(OH)D] measurements
vid <- load_nhanes_module(
"VID",
keep_vars = c("SEQN", "LBXVIDMS", "LBXVIDLC", "LBDVIDMS")
)
## Load lifetime and current smoking status
smq <- load_nhanes_module("SMQ", keep_vars = c("SEQN","SMQ020","SMQ040"))
## Load health insurance coverage
hiq <- load_nhanes_module("HIQ", keep_vars = c("SEQN","HIQ011"))
## Load household food security
fsq <- load_nhanes_module("FSQ", keep_vars = c("SEQN","FSDHH"))
## Load alcohol consumption (ever drinker status, ALQ101)
alq <- load_nhanes_module("ALQ", keep_vars = c("SEQN","ALQ101","ALQ130"))
## Load HbA1c
hba1c <- load_nhanes_module("GHB", keep_vars = c("SEQN","LBXGH"))
## Load total and HDL cholesterol
tchol <- load_nhanes_module("TCHOL", keep_vars = c("SEQN","LBXTC"))
hdl <- load_nhanes_module("HDL", keep_vars = c("SEQN","LBDHDD"))
## Load physical activity variables
paq_keep <- c("SEQN","PAQ650","PAQ665","PAD200","PAD320")
paq <- load_nhanes_module("PAQ", keep_vars = paq_keep)
# Dietary Inflammatory Index (DII)-------------------------------------------------------------
## Define NHANES cycles for which DII is available (2005–2012)
dii_cycles <- c("0506", "0708", "0910", "1112")
## Obtain DII results for a single NHANES cycle (day 1 only)
get_dii_cycle <- function(cyc) {
message("Obtain DII for NHANES cycle ", cyc, " (Day 1)...")
DII_NHANES_PLUS_RESULT(
NHANESCycle = cyc,
DAY = "first"
) |>
mutate(cycle_code_ndp = cyc)
}
## Obtain and pool DII data across all relevant cycles
dii_all <- dii_cycles |>
map_dfr(get_dii_cycle)
## Retain DII total scores (with and without alcohol) and nutrient components
dii_clean <- dii_all |>
transmute(
seqn = SEQN,
dii_etoh_raw = DII_ALL, # DII including alcohol
dii_raw = DII_NOETOH, # DII excluding alcohol
kcal = KCAL,
carb = CARB,
protein = PROTEIN,
totalfat = TOTALFAT,
satfat = SATFAT,
pufa = PUFA,
mufa = MUFA,
n3fat = N3FAT,
n6fat = N6FAT,
choles = CHOLES,
vitA = VITA,
bcarotene = BCAROTENE,
vitC = VITC,
vitE = VITE,
vitB6 = VITB6,
vitB12 = VITB12,
folicacid = FOLICACID,
thiamin = THIAMIN,
riboflavin = RIBOFLAVIN,
niacin = NIACIN,
iron = IRON,
mg = MG,
zn = ZN,
se = SE,
fiber = FIBER,
caffeine = CAFFEINE
) |>
mutate(
dii = dii_raw, # DII without alcohol (main exposure)
dii_etoh = dii_etoh_raw # DII including alcohol (secondary)
)
## Define full set of DII component nutrient variables for quartile derivation
dii_component_vars <- c(
"kcal", "carb", "protein", "totalfat", "satfat",
"pufa", "mufa", "n3fat", "n6fat", "choles",
"vitA", "bcarotene", "vitC", "vitE",
"vitB6", "vitB12", "folicacid",
"thiamin", "riboflavin", "niacin",
"iron", "mg", "zn", "se",
"fiber", "caffeine"
)
# Data cleaning and harmonization -------------------------------------------------------------
## Harmonize physical activity variables across NHANES cycles
paq_clean <- paq |>
transmute(
seqn = SEQN,
paq650 = PAQ650, # Vigorous recreational activity ≥10 minutes (2007+)
paq665 = PAQ665, # Moderate recreational activity ≥10 minutes (2007+)
pad200 = PAD200, # Vigorous activity ≥10 minutes over past 30 days (2005–06)
pad320 = PAD320 # Moderate activity ≥10 minutes over past 30 days (2005–06)
)
## Clean demographic and design variables and derive a numeric cycle code
demo_clean <- demo |>
mutate(
sddsrvyr_label = as.character(SDDSRVYR),
sddsrvyr = case_when(
grepl("2005-2006", sddsrvyr_label) ~ 4,
grepl("2007-2008", sddsrvyr_label) ~ 5,
grepl("2009-2010", sddsrvyr_label) ~ 6,
grepl("2011-2012", sddsrvyr_label) ~ 7,
TRUE ~ NA_real_
)
) |>
transmute(
seqn = SEQN,
age = as.numeric(RIDAGEYR), # Age (years)
sex_raw = RIAGENDR, # Sex
race_raw = RIDRETH1, # Race/ethnicity (1–5)
educ_raw = DMDEDUC2, # Education level
pir = INDFMPIR, # Poverty-income ratio
sddsrvyr,
cycle_str = cycle_label(sddsrvyr), # Cycle label
sdmvpsu = SDMVPSU, # PSU for survey design
sdmvstra = SDMVSTRA, # Stratum for survey design
wtint2yr = WTINT2YR, # Interview weight (2-year)
wtmec2yr = WTMEC2YR, # MEC weight (2-year)
preg = RIDEXPRG # Pregnancy status
)
## Clean the medical conditions module and derive 0/1 indicators for asthma, family history of asthma, and arthritis
mcq_clean <- mcq |>
transmute(
seqn = SEQN,
asthma_raw = MCQ010, # Asthma
fh_asthma_raw = MCQ300B, # Family history asthma
arthritis_raw = MCQ160A, # Ever arthritis
mcq190 = MCQ190, # Arthritis type (2005–08)
mcq191 = MCQ191, # Arthritis type (2009–10)
mcq195 = MCQ195 # Arthritis type (2011–12)
) |>
mutate(
asthma = to01(asthma_raw),
fh_asthma = to01(fh_asthma_raw),
arthritis_ever = to01(arthritis_raw)
)
## Derive eczema status restricted to the 2005–2006 cycle
eczema_clean <- demo_clean |>
select(seqn, sddsrvyr) |>
left_join(
agq_d |>
transmute(
seqn = SEQN,
agq180_ecz = to01(AGQ180) # Eczema code
),
by = "seqn"
) |>
mutate(
ad = if_else(sddsrvyr == 4, agq180_ecz, NA_real_)
) |>
select(seqn, ad)
## Derive food allergy status restricted to the 2007–2010 cycles
fa_clean <- demo_clean |>
select(seqn, sddsrvyr) |>
left_join(
dbq |>
transmute(
seqn = SEQN,
dbq920_fa = to01(DBQ920) # Food allergy code
),
by = "seqn"
) |>
mutate(
fa = if_else(sddsrvyr %in% c(5, 6), dbq920_fa, NA_real_)
) |>
select(seqn, fa)
## Harmonize allergic rhinitis across 2005–2006 and 2007–2012
ar_clean <- demo_clean |>
select(seqn, sddsrvyr) |>
left_join(
agq_d |>
transmute(seqn = SEQN, ar_agq010 = to01(AGQ010)), # AR code (2005–06)
by = "seqn"
) |>
left_join(
rdq |>
transmute(seqn = SEQN, ar_agq030_rdq = to01(AGQ030)), # AR code (2007–12)
by = "seqn"
) |>
mutate(
ar = case_when(
sddsrvyr == 4 ~ ar_agq010,
sddsrvyr %in% c(5, 6, 7) ~ ar_agq030_rdq,
TRUE ~ NA_real_
)
) |>
select(seqn, ar)
## Derive a 0/1 diabetes indicator from DIQ010
diq_clean <- diq |>
transmute(
seqn = SEQN,
dm2_diag_chr = as.character(DIQ010)
) |>
mutate(
dm2 = case_when(
dm2_diag_chr %in% c("1", "Yes") ~ 1,
dm2_diag_chr %in% c("2","No","3","Borderline") ~ 0,
TRUE ~ NA_real_
)
) |>
select(seqn, dm2)
## Derive 0/1 hypertension indicator from BPQ020
bpq_clean <- bpq |>
transmute(
seqn = SEQN,
htn_chr = as.character(BPQ020)
) |>
mutate(
htn = case_when(
htn_chr %in% c("1","Yes") ~ 1,
htn_chr %in% c("2","No") ~ 0,
TRUE ~ NA_real_
)
) |>
select(seqn, htn)
## Clean anthropometrics and derive standard measures
bmx_clean <- bmx |>
transmute(
seqn = SEQN,
bmi = BMXBMI, # Body mass index (kg/m^2)
waist_cm = BMXWAIST, # Waist circumference (cm)
height_cm = BMXHT, # Standing height (cm)
weight_kg = BMXWT # Weight (kg)
)
## Compute mean systolic and diastolic blood pressures across readings
bp_clean <- bp |>
mutate(
sbp_mean = rowMeans(across(BPXSY1:BPXSY4), na.rm = TRUE),
dbp_mean = rowMeans(across(BPXDI1:BPXDI4), na.rm = TRUE),
dbp_mean = if_else(dbp_mean == 0, NA_real_, dbp_mean)
) |>
transmute(
seqn = SEQN,
sbp_mean = if_else(is.nan(sbp_mean), NA_real_, sbp_mean),
dbp_mean = if_else(is.nan(dbp_mean), NA_real_, dbp_mean)
)
## Harmonize vitamin D [25(OH)D] across laboratory methods, taking first non-missing value
vid_clean <- vid |>
mutate(
vitD25oh = coalesce(LBXVIDMS, LBXVIDLC, LBDVIDMS)
) |>
transmute(
seqn = SEQN,
vitD25oh = vitD25oh
)
## Derive smoking status (never, former, current)
smq_clean <- smq |>
transmute(
seqn = SEQN,
smoked_100 = as.numeric(SMQ020),
smoke_now = as.numeric(SMQ040)
) |>
mutate(
smoking = case_when(
smoked_100 == 2 ~ "Never",
smoked_100 == 1 & smoke_now == 3 ~ "Former",
smoked_100 == 1 & smoke_now %in% c(1,2) ~ "Current",
TRUE ~ NA_character_
)
)
## Derive insurance indicator using HIQ011
hiq_clean <- hiq |>
transmute(
seqn = SEQN,
insured = to01(HIQ011) # 1 = insured, 0 = uninsured
)
## Classify household food security into four ordered categories
fsq_clean <- fsq |>
transmute(
seqn = SEQN,
foodsec_raw = as.numeric(FSDHH),
food_security = case_when(
foodsec_raw == 1 ~ "Full",
foodsec_raw == 2 ~ "Marginal",
foodsec_raw == 3 ~ "Low",
foodsec_raw == 4 ~ "Very low",
TRUE ~ NA_character_
)
)
## Alcohol indicator based on ever having more than 12 drinks in any one year (ALQ101)
alq_clean <- alq |>
transmute(
seqn = SEQN,
any_alcohol = case_when(
ALQ101 %in% c(1, "1", "Yes") ~ 1,
ALQ101 %in% c(2, "2", "No") ~ 0,
TRUE ~ NA_real_
)
)
## Retain HbA1c
hba1c_clean <- hba1c |>
transmute(
seqn = SEQN,
hba1c = LBXGH
)
## Retain total and HDL cholesterol
tchol_clean <- tchol |> transmute(seqn = SEQN, tchol = LBXTC)
hdl_clean <- hdl |> transmute(seqn = SEQN, hdl = LBDHDD)
# Merge modules and define analytic cohort ----------------------------------------------------
## Merge all cleaned modules to construct a unified NHANES dataset
nhanes_all <- demo_clean |>
left_join(mcq_clean, by = "seqn") |>
left_join(eczema_clean, by = "seqn") |>
left_join(fa_clean, by = "seqn") |>
left_join(ar_clean, by = "seqn") |>
left_join(bmx_clean, by = "seqn") |>
left_join(bp_clean, by = "seqn") |>
left_join(vid_clean, by = "seqn") |>
left_join(smq_clean, by = "seqn") |>
left_join(hiq_clean, by = "seqn") |>
left_join(fsq_clean, by = "seqn") |>
left_join(alq_clean, by = "seqn") |>
left_join(hba1c_clean, by = "seqn") |>
left_join(tchol_clean, by = "seqn") |>
left_join(hdl_clean, by = "seqn") |>
left_join(dii_clean, by = "seqn") |>
left_join(diq_clean, by = "seqn") |>
left_join(bpq_clean, by = "seqn") |>
left_join(paq_clean, by = "seqn")
## Restrict the analytic cohort and derive all final variables used in regression and machine learning analyses
nhanes_analytic <- nhanes_all |>
# Restrict to adults 20–40 years old, 2005–2012
filter(
!is.na(sddsrvyr),
sddsrvyr %in% 4:7,
age >= 20, age <= 40
) |>
# Exclude pregnant participants at the exam
filter(
is.na(preg) |
!preg %in% c("Yes, positive lab pregnancy test or self-reported pregnant at exam")
) |>
mutate(
# Multi-cycle weights (pooled 8-year weights over 4 cycles)
wtint8yr = wtint2yr / length(cycles),
wtmec8yr = wtmec2yr / length(cycles),
# Sex factor
sex = case_when(
sex_raw %in% c(1, "1") ~ "Male",
sex_raw %in% c(2, "2") ~ "Female",
as.character(sex_raw) == "Male" ~ "Male",
as.character(sex_raw) == "Female" ~ "Female",
TRUE ~ NA_character_
),
sex = factor(sex, levels = c("Male", "Female")),
# Race/ethnicity factor (RIDRETH1 1–5)
race = case_when(
race_raw %in% c(3, "3", "Non-Hispanic White") ~ "Non-Hispanic White",
race_raw %in% c(4, "4", "Non-Hispanic Black") ~ "Non-Hispanic Black",
race_raw %in% c(1, "1", "Mexican American") ~ "Mexican American",
race_raw %in% c(2, "2", "Other Hispanic") ~ "Other Hispanic",
race_raw %in% c(5, "5") ~ "Other/multiracial",
TRUE ~ NA_character_
),
race = factor(
race,
levels = c(
"Non-Hispanic White",
"Non-Hispanic Black",
"Mexican American",
"Other Hispanic",
"Other/multiracial"
)
),
# Education factor (collapsed)
educ_factor_raw = factor(educ_raw),
educ = case_when(
educ_factor_raw %in% c(
"Less Than 9th Grade",
"Less than 9th grade"
) ~ "<9 years",
educ_factor_raw %in% c(
"9-11th Grade (Includes 12th grade with no diploma)",
"9-11th grade (Includes 12th grade with no diploma)",
"High School Grad/GED or Equivalent",
"High school graduate/GED or equivalent"
) ~ "9–12 years",
educ_factor_raw %in% c(
"Some College or AA degree",
"Some college or AA degree",
"College Graduate or above",
"College graduate or above"
) ~ ">12 years",
TRUE ~ NA_character_
),
educ = factor(
educ,
levels = c("<9 years","9–12 years",">12 years")
),
# Smoking factor
smoking = factor(
smoking,
levels = c("Never", "Former", "Current")
),
# Physical activity factor (Sedentary, Moderate, Intense), harmonized across cycles
phys_act = case_when(
# 2005–2006 (sddsrvyr == 4): PAD200 / PAD320
sddsrvyr == 4 & to01(pad200) == 1 ~ "Intense",
sddsrvyr == 4 & to01(pad200) != 1 & to01(pad320) == 1 ~ "Moderate",
sddsrvyr == 4 & to01(pad200) == 0 & to01(pad320) == 0 ~ "Sedentary",
# 2007–2012 (sddsrvyr %in% 5:7): PAQ650 / PAQ665
sddsrvyr %in% c(5, 6, 7) & as.character(paq650) == "Yes" ~ "Intense",
sddsrvyr %in% c(5, 6, 7) &
as.character(paq650) == "No" & as.character(paq665) == "Yes" ~ "Moderate",
sddsrvyr %in% c(5, 6, 7) &
as.character(paq650) == "No" & as.character(paq665) == "No" ~ "Sedentary",
TRUE ~ NA_character_
),
phys_act = factor(
phys_act,
levels = c("Sedentary","Moderate","Intense")
),
# Food security factor
food_sec = factor(
food_security,
levels = c("Full","Marginal","Low","Very low"),
ordered = TRUE
),
# Insurance factor
insur = factor(
case_when(
insured == 1 ~ "Yes",
insured == 0 ~ "No",
TRUE ~ NA_character_
),
levels = c("No", "Yes")
),
# Alcohol factor (no/yes) based on any_alcohol from ALQ101
alcohol = factor(
case_when(
any_alcohol == 0 ~ "No",
any_alcohol == 1 ~ "Yes",
TRUE ~ NA_character_
),
levels = c("No", "Yes")
),
# Validity flags for outcomes with restricted survey windows
ad_valid = sddsrvyr == 4, # Eczema only in 2005–06
fa_valid = sddsrvyr %in% c(5, 6), # Food allergy only in 2007–10
# Inflammatory arthritis based on multiple arthritis type questions
inflam_arthritis = {
mcq190_chr <- as.character(mcq190)
mcq191_chr <- as.character(mcq191)
mcq195_chr <- as.character(mcq195)
case_when(
# Denial of ever having arthritis
arthritis_ever == 0 ~ 0,
# 2005–2006 and 2007–2008: MCQ190 arthritis type
arthritis_ever == 1 & sddsrvyr %in% c(4, 5) &
grepl("Rhe", mcq190_chr, ignore.case = TRUE) ~ 1,
arthritis_ever == 1 & sddsrvyr %in% c(4, 5) &
grepl("Ost|Osteo|Oth", mcq190_chr, ignore.case = TRUE) ~ 0,
# 2009–2010: MCQ191 arthritis type
arthritis_ever == 1 & sddsrvyr == 6 &
(grepl("Rhe", mcq191_chr, ignore.case = TRUE) |
grepl("Pso", mcq191_chr, ignore.case = TRUE)) ~ 1,
arthritis_ever == 1 & sddsrvyr == 6 &
grepl("Ost|Osteo|Oth", mcq191_chr, ignore.case = TRUE) ~ 0,
# 2011–2012: MCQ195 arthritis type
arthritis_ever == 1 & sddsrvyr == 7 &
(grepl("Rhe", mcq195_chr, ignore.case = TRUE) |
grepl("Pso", mcq195_chr, ignore.case = TRUE)) ~ 1,
arthritis_ever == 1 & sddsrvyr == 7 &
grepl("Ost|Osteo|Oth", mcq195_chr, ignore.case = TRUE) ~ 0,
TRUE ~ NA_real_
)
},
# Final inflammatory arthritis, diabetes, and hypertension indicators (0/1)
arthritis = inflam_arthritis,
diabetes = dm2,
htn = htn,
# Family history of asthma (0/1)
fh_asthma = fh_asthma,
# Anthropometric ratio and non-HDL cholesterol
whtr = waist_cm / height_cm, # Waist-to-height ratio
non_hdl = if_else(!is.na(tchol) & !is.na(hdl), tchol - hdl, NA_real_), # Non-HDL
# Vitamin D categories
vitD25oh_cat = case_when(
is.na(vitD25oh) ~ NA_character_,
vitD25oh < 20 ~ "<20 (deficient)",
vitD25oh >= 20 & vitD25oh < 30 ~ "20–29 (insufficient)",
vitD25oh >= 30 ~ "≥30 (sufficient)"
),
vitD25oh_cat = factor(
vitD25oh_cat,
levels = c("<20 (deficient)",
"20–29 (insufficient)",
"≥30 (sufficient)"),
ordered = TRUE
),
# DII quartiles (from most anti-inflammatory to most pro-inflammatory)
dii_quart = ntile(dii, 4),
dii_quart = factor(
dii_quart,
levels = c(1, 2, 3, 4),
labels = c(
"Q1 (most anti-inflammatory)",
"Q2",
"Q3",
"Q4 (most pro-inflammatory)"
),
ordered = TRUE
),
# Pregnancy status (derived categorical indicator)
preg_flag = case_when(
preg == "Yes, positive lab pregnancy test or self-reported pregnant at exam" ~ 1,
preg %in% c("SP not pregnant at exam",
"The participant was not pregnant at exam") ~ 0,
preg %in% c("Cannot ascertain if SP is pregnant at exam",
"Cannot ascertain if the participant is pregnant at exam") ~ NA_real_,
TRUE ~ NA_real_
),
preg_status = factor(
case_when(
preg_flag == 1 ~ "Pregnant",
preg_flag == 0 ~ "Not pregnant",
TRUE ~ NA_character_
),
levels = c("Not pregnant", "Pregnant")
),
# Quartiles for DII component nutrients (keep only quartile factors downstream)
across(
all_of(dii_component_vars),
~ ntile(., 4),
.names = "{.col}_quart"
),
across(
all_of(paste0(dii_component_vars, "_quart")),
~ factor(
.,
levels = c(1, 2, 3, 4),
labels = c("Q1", "Q2", "Q3", "Q4"),
ordered = TRUE
)
),
# BMI categories
bmi_cat = case_when(
is.na(bmi) ~ NA_character_,
bmi < 18.5 ~ "Underweight (<18.5)",
bmi < 25 ~ "Normal (18.5–24.9)",
bmi < 30 ~ "Overweight (25–29.9)",
bmi < 35 ~ "Obesity I (30–34.9)",
bmi < 40 ~ "Obesity II (35–39.9)",
TRUE ~ "Obesity III (≥40)"
),
bmi_cat = factor(
bmi_cat,
levels = c(
"Underweight (<18.5)",
"Normal (18.5–24.9)",
"Overweight (25–29.9)",
"Obesity I (30–34.9)",
"Obesity II (35–39.9)",
"Obesity III (≥40)"
),
ordered = TRUE
),
# Waist-to-height ratio categories
whtr_cat = case_when(
is.na(whtr) ~ NA_character_,
whtr < 0.5 ~ "<0.5",
whtr < 0.6 ~ "0.5–0.59",
TRUE ~ "≥0.6"
),
whtr_cat = factor(
whtr_cat,
levels = c("<0.5", "0.5–0.59", "≥0.6"),
ordered = TRUE
),
# SBP categories
sbp_cat = case_when(
is.na(sbp_mean) ~ NA_character_,
sbp_mean < 120 ~ "<120",
sbp_mean < 130 ~ "120–129",
sbp_mean < 140 ~ "130–139",
sbp_mean < 160 ~ "140–159",
TRUE ~ "≥160"
),
sbp_cat = factor(
sbp_cat,
levels = c("<120", "120–129", "130–139", "140–159", "≥160"),
ordered = TRUE
),
# DBP categories
dbp_cat = case_when(
is.na(dbp_mean) ~ NA_character_,
dbp_mean < 80 ~ "<80",
dbp_mean < 90 ~ "80–89",
dbp_mean < 100 ~ "90–99",
TRUE ~ "≥100"
),
dbp_cat = factor(
dbp_cat,
levels = c("<80", "80–89", "90–99", "≥100"),
ordered = TRUE
),
# Numeric variables for ordered categories (for trend tests and ML)
dii_quart_trend = if_else(!is.na(dii_quart), as.numeric(dii_quart) - 1, NA_real_),
food_sec_score = if_else(!is.na(food_sec), as.numeric(food_sec) - 1, NA_real_),
vitD25oh_cat_trend = if_else(!is.na(vitD25oh_cat), as.numeric(vitD25oh_cat) - 1, NA_real_),
bmi_cat_trend = if_else(!is.na(bmi_cat), as.numeric(bmi_cat) - 1, NA_real_),
whtr_cat_trend = if_else(!is.na(whtr_cat), as.numeric(whtr_cat) - 1, NA_real_),
sbp_cat_trend = if_else(!is.na(sbp_cat), as.numeric(sbp_cat) - 1, NA_real_),
dbp_cat_trend = if_else(!is.na(dbp_cat), as.numeric(dbp_cat) - 1, NA_real_),
phys_act_trend = if_else(!is.na(phys_act), as.numeric(phys_act) - 1, NA_real_)
)
# Finalize analytic dataset for regression and ML ---------------------------------------------
## Construct final analytic dataset that contains all variables required for both regression and machine learning analyses
nhanes_analytic <- nhanes_analytic |>
select(
# IDs and survey design
seqn,
sddsrvyr, cycle_str, sdmvpsu, sdmvstra,
wtint2yr, wtmec2yr, wtint8yr, wtmec8yr,
# Demographics
age, # Age
sex, # Sex (Male/Female)
race, # Race/ethnicity (5 categories)
educ, # Education (<9, 9–12, >12 years)
pir, # Poverty-income ratio
# Psychosocial and behavioral factors
smoking, # Smoking status (Never/Former/Current)
phys_act, # Physical activity (Sedentary/Moderate/Intense)
phys_act_trend, # Physical activity trend (0–2)
food_sec, # Food security (Full/Marginal/Low/Very low)
food_sec_score, # Food security score (0–3)
insur, # Health insurance (Yes/No)
any_alcohol, # Self-reported alcohol use (0/1; from ALQ101)
alcohol, # Self-reported alcohol use (No/Yes factor)
# Anthropometrics and cardiometabolic variables
bmi, # Body mass index (kg/m^2)
whtr, # Waist-to-height ratio
sbp_mean, # Mean systolic BP (mmHg)
dbp_mean, # Mean diastolic BP (mmHg)
tchol, # Total cholesterol (mg/dL)
hdl, # HDL cholesterol (mg/dL)
non_hdl, # Non-HDL cholesterol (mg/dL)
hba1c, # HbA1c (%)
# Anthropometric and BP categories
bmi_cat,
bmi_cat_trend, # BMI category trend (0–5)
whtr_cat,
whtr_cat_trend, # WHtR category trend (0–2)
sbp_cat,
sbp_cat_trend, # SBP category trend (0–4)
dbp_cat,
dbp_cat_trend, # DBP category trend (0–3)
# Vitamin D and DII summary scores
vitD25oh, # Serum Vitamin D [25(OH)D] (nmol/L)
vitD25oh_cat, # Vitamin D categories
vitD25oh_cat_trend, # Vitamin D category trend (0–2)
dii, # DII (no alcohol)
dii_quart, # DII quartiles (ordered)
dii_quart_trend, # DII quartile trend (0–3)
dii_etoh, # DII including alcohol (secondary)
# DII component nutrient quartiles
kcal_quart, carb_quart, protein_quart, totalfat_quart, satfat_quart,
pufa_quart, mufa_quart, n3fat_quart, n6fat_quart, choles_quart,
vitA_quart, bcarotene_quart, vitC_quart, vitE_quart,
vitB6_quart, vitB12_quart, folicacid_quart, thiamin_quart,
riboflavin_quart, niacin_quart, iron_quart, mg_quart, zn_quart, se_quart,
fiber_quart, caffeine_quart,
# Disease outcomes and family history (0/1)
asthma, # Asthma
ar, # Allergic rhinitis
arthritis, # Inflammatory arthritis
diabetes, # Diabetes
htn, # Hypertension
ad, # Eczema (2005–06)
fa, # Food allergy (2007–10)
fh_asthma # Family history of asthma (0/1)
)
## Set reference levels for key categorical variables used in regression models
nhanes_analytic <- nhanes_analytic |>
mutate(
race = fct_relevel(as.factor(race), "Non-Hispanic White"),
educ = fct_relevel(as.factor(educ), ">12 years"),
vitD25oh_cat = fct_relevel(as.factor(vitD25oh_cat), "≥30 (sufficient)"),
dii_quart = fct_relevel(as.factor(dii_quart), "Q1 (most anti-inflammatory)"),
bmi_cat = fct_relevel(as.factor(bmi_cat), "Normal (18.5–24.9)")
)
## Save analytic dataset for use in downstream regression and machine learning analyses
saveRDS(nhanes_analytic, file = "nhanes_analytic.rds")
## Add a constant for total population estimates
nhanes_analytic <- nhanes_analytic |> mutate(one = 1)
## Configure survey options for lonely PSUs
options(survey.lonely.psu = "adjust")
## Define interview-weighted survey design (8-year pooled weights)
des_int_8yr <- svydesign(
ids = ~sdmvpsu,
strata = ~sdmvstra,
weights = ~wtint8yr,
nest = TRUE,
data = nhanes_analytic
)
## Define MEC-weighted survey design (8-year pooled weights)
des_mec_8yr <- svydesign(
ids = ~sdmvpsu,
strata = ~sdmvstra,
weights = ~wtmec8yr,
nest = TRUE,
data = nhanes_analytic
)
## Total weighted population using MEC weights (stored)
total_pop_mec_8yr <- svytotal(~one, design = des_mec_8yr)
# Regression dataset generation ---------------------------------------------------------------
## Set survey and contrast options for regression models.
options(survey.lonely.psu = "adjust")
options(contrasts = c("contr.treatment", "contr.treatment"))
## Ensure that constant variable for total population estimates is present
if (!"one" %in% names(nhanes_analytic)) {
nhanes_analytic <- nhanes_analytic |>
mutate(one = 1)
}
## Construct outcome-specific analytic datasets restricted to non-missing outcomes
### Asthma (all cycles, non-missing)
asthma_reg <- nhanes_analytic |>
filter(!is.na(asthma))
### Allergic rhinitis (all cycles, non-missing)
ar_reg <- nhanes_analytic |>
filter(!is.na(ar))
### Inflammatory arthritis (all cycles, non-missing)
inflam_reg <- nhanes_analytic |>
filter(!is.na(arthritis))
### Hypertension (all cycles, non-missing)
htn_reg <- nhanes_analytic |>
filter(!is.na(htn))
### Diabetes (all cycles, non-missing)
dm_reg <- nhanes_analytic |>
filter(!is.na(diabetes))
### Eczema (2005–2006, non-missing)
ad_reg <- nhanes_analytic |>
filter(sddsrvyr == 4, !is.na(ad))
### Food allergy (2007–2010, non-missing)
fa_reg <- nhanes_analytic |>
filter(sddsrvyr %in% c(5, 6), !is.na(fa))
## Define helper to construct outcome-specific survey designs using 8-year MEC weights
make_design_wt8yr <- function(data) {
svydesign(
ids = ~sdmvpsu,
strata = ~sdmvstra,
weights = ~wtmec8yr,
nest = TRUE,
data = data
) |>
subset(wtmec8yr > 0)
}
## Construct outcome-specific survey designs for regression analyses
des_asthma_reg <- make_design_wt8yr(asthma_reg)
des_ar_reg <- make_design_wt8yr(ar_reg)
des_inflam_reg <- make_design_wt8yr(inflam_reg)
des_htn_reg <- make_design_wt8yr(htn_reg)
des_dm_reg <- make_design_wt8yr(dm_reg)
des_ad_reg <- make_design_wt8yr(ad_reg)
des_fa_reg <- make_design_wt8yr(fa_reg)
## Define list structure to organize outcome-specific datasets and survey designs
outcome_designs <- list(
asthma = list(
var = "asthma",
label = "Asthma",
data = asthma_reg,
design = des_asthma_reg
),
ar = list(
var = "ar",
label = "Allergic rhinitis",
data = ar_reg,
design = des_ar_reg
),
inflam = list(
var = "arthritis",
label = "Inflammatory arthritis",
data = inflam_reg,
design = des_inflam_reg
),
htn = list(
var = "htn",
label = "Hypertension",
data = htn_reg,
design = des_htn_reg
),
dm = list(
var = "diabetes",
label = "Diabetes",
data = dm_reg,
design = des_dm_reg
),
ad = list(
var = "ad",
label = "Eczema",
data = ad_reg,
design = des_ad_reg
),
fa = list(
var = "fa",
label = "Food allergy",
data = fa_reg,
design = des_fa_reg
)
)
# Variable information summary-----------------------------------------------------------------
## Full variable labels
var_labels <- c(
# IDs / design
seqn = "Participant ID",
sddsrvyr = "Survey cycle",
cycle_str = "Survey cycle label",
sdmvpsu = "Masked variance pseudo-PSU",
sdmvstra = "Masked variance pseudo-stratum",
wtint2yr = "Interview weight (2-year)",
wtmec2yr = "MEC exam weight (2-year)",
wtint8yr = "Interview weight (8-year pooled)",
wtmec8yr = "MEC exam weight (8-year pooled)",
# Demographics
age = "Age in years",
sex = "Sex",
race = "Race/ethnicity",
educ = "Education level",
pir = "Poverty-income ratio",
# Behavioral
smoking = "Smoking status",
phys_act = "Recreational physical activity level",
phys_act_trend = "Physical activity trend score",
food_sec = "Household food security status",
food_sec_score = "Food security ordinal score",
insur = "Any health insurance coverage",
any_alcohol = "Ever drank ≥12 drinks in any 1 year (ALQ101)",
alcohol = "Alcohol use",
# Anthropometrics and cardiometabolic
bmi = "Body mass index (kg/m^2)",
whtr = "Waist-to-height ratio",
sbp_mean = "Mean systolic blood pressure (mmHg)",
dbp_mean = "Mean diastolic blood pressure (mmHg)",
tchol = "Total cholesterol (mg/dL)",
hdl = "HDL cholesterol (mg/dL)",
non_hdl = "Non-HDL cholesterol (mg/dL)",
hba1c = "Hemoglobin A1c (%)",
bmi_cat = "BMI category",
bmi_cat_trend = "BMI category trend score",
whtr_cat = "Waist-to-height ratio category",
whtr_cat_trend = "WHtR category trend score",
sbp_cat = "Systolic blood pressure category",
sbp_cat_trend = "SBP category trend score",
dbp_cat = "Diastolic blood pressure category",
dbp_cat_trend = "DBP category trend score",
# Vitamin D / DII
vitD25oh = "Serum Vitamin D [25(OH)D] concentration (nmol/L)",
vitD25oh_cat = "Vitamin D status category",
vitD25oh_cat_trend = "Vitamin D status trend score",
dii = "Dietary Inflammatory Index (no alcohol)",
dii_quart = "DII quartile",
dii_quart_trend = "DII quartile trend score",
dii_etoh = "Dietary Inflammatory Index including alcohol",
# Nutrient quartiles
kcal_quart = "Total energy intake (kcal) quartile",
carb_quart = "Carbohydrate intake quartile",
protein_quart = "Protein intake quartile",
totalfat_quart = "Total fat intake quartile",
satfat_quart = "Saturated fat intake quartile",
pufa_quart = "Polyunsaturated fat intake quartile",
mufa_quart = "Monounsaturated fat intake quartile",
n3fat_quart = "Omega-3 fat intake quartile",
n6fat_quart = "Omega-6 fat intake quartile",
choles_quart = "Cholesterol intake quartile",
vitA_quart = "Vitamin A intake quartile",
bcarotene_quart = "Beta-carotene intake quartile",
vitC_quart = "Vitamin C intake quartile",
vitE_quart = "Vitamin E intake quartile",
vitB6_quart = "Vitamin B6 intake quartile",
vitB12_quart = "Vitamin B12 intake quartile",
folicacid_quart = "Folic acid intake quartile",
thiamin_quart = "Thiamin intake quartile",
riboflavin_quart = "Riboflavin intake quartile",
niacin_quart = "Niacin intake quartile",
iron_quart = "Iron intake quartile",
mg_quart = "Magnesium intake quartile",
zn_quart = "Zinc intake quartile",
se_quart = "Selenium intake quartile",
fiber_quart = "Fiber intake quartile",
caffeine_quart = "Caffeine intake quartile",
# Outcomes and related variables
asthma = "Reported asthma",
ar = "Reported allergic rhinitis (hay fever)",
arthritis = "Reported rheumatoid or psoriatic arthritis",
diabetes = "Reported diabetes",
htn = "Reported hypertension",
ad = "Reported eczema (atopic dermatitis; 2005–2006 only)",
fa = "Reported food allergy",
fh_asthma = "Family history of asthma"
)
## Define variable categories (design, core predictors, nutrient predictors,
## main exposures, outcomes, and selected covariates).
var_roles <- c(
# Design / ID
seqn = "id",
sddsrvyr = "design",
cycle_str = "design",
sdmvpsu = "design",
sdmvstra = "design",
wtint2yr = "weight_2yr",
wtmec2yr = "weight_2yr",
wtint8yr = "weight_8yr",
wtmec8yr = "weight_8yr",
# Core predictors
age = "predictor_core",
sex = "predictor_core",
race = "predictor_core",
educ = "predictor_core",
pir = "predictor_core",
smoking = "predictor_core",
phys_act = "predictor_core",
phys_act_trend = "predictor_core",
food_sec = "predictor_core",
food_sec_score = "predictor_core",
insur = "predictor_core",
any_alcohol = "predictor_core",
alcohol = "predictor_core",
bmi = "predictor_core",
whtr = "predictor_core",
sbp_mean = "predictor_core",
dbp_mean = "predictor_core",
tchol = "predictor_core",
hdl = "predictor_core",
non_hdl = "predictor_core",
hba1c = "predictor_core",
bmi_cat = "predictor_core",
bmi_cat_trend = "predictor_core",
whtr_cat = "predictor_core",
whtr_cat_trend = "predictor_core",
sbp_cat = "predictor_core",
sbp_cat_trend = "predictor_core",
dbp_cat = "predictor_core",
dbp_cat_trend = "predictor_core",
# Vitamin D / DII exposure domain
vitD25oh = "predictor_core",
vitD25oh_cat = "predictor_core",
vitD25oh_cat_trend = "predictor_core",
dii = "exposure_DII_main",
dii_quart = "exposure_DII_ordinal",
dii_quart_trend = "exposure_DII_trend",
dii_etoh = "exposure_DII_secondary",
# Nutrient quartiles
kcal_quart = "predictor_nutrient",
carb_quart = "predictor_nutrient",
protein_quart = "predictor_nutrient",
totalfat_quart = "predictor_nutrient",
satfat_quart = "predictor_nutrient",
pufa_quart = "predictor_nutrient",
mufa_quart = "predictor_nutrient",
n3fat_quart = "predictor_nutrient",
n6fat_quart = "predictor_nutrient",
choles_quart = "predictor_nutrient",
vitA_quart = "predictor_nutrient",
bcarotene_quart = "predictor_nutrient",
vitC_quart = "predictor_nutrient",
vitE_quart = "predictor_nutrient",
vitB6_quart = "predictor_nutrient",
vitB12_quart = "predictor_nutrient",
folicacid_quart = "predictor_nutrient",
thiamin_quart = "predictor_nutrient",
riboflavin_quart = "predictor_nutrient",
niacin_quart = "predictor_nutrient",
iron_quart = "predictor_nutrient",
mg_quart = "predictor_nutrient",
zn_quart = "predictor_nutrient",
se_quart = "predictor_nutrient",
fiber_quart = "predictor_nutrient",
caffeine_quart = "predictor_nutrient",
# Outcomes and related covariates
asthma = "outcome_primary",
ar = "outcome_primary",
arthritis = "outcome_primary",
diabetes = "outcome_primary",
htn = "outcome_primary",
ad = "outcome_secondary",
fa = "outcome_secondary",
fh_asthma = "predictor_family_history"
)
## Helper function to generate variable summary for data dictionary table
make_var_summary <- function(data, vars = names(data),
var_labels = NULL) {
map_dfr(vars, function(v) {
x <- data[[v]]
n <- length(x)
n_missing <- sum(is.na(x))
pct_missing <- if (n > 0) n_missing / n else NA_real_
type <- if (is.factor(x)) {
"factor"
} else if (is.character(x)) {
"character"
} else if (is.numeric(x)) {
"numeric"
} else if (is.logical(x)) {
"logical"
} else {
paste(class(x), collapse = ",")
}
if (is.numeric(x)) {
rng <- suppressWarnings(range(x, na.rm = TRUE))
min_val <- if (length(rng) == 2 && all(is.finite(rng))) rng[1] else NA_real_
max_val <- if (length(rng) == 2 && all(is.finite(rng))) rng[2] else NA_real_
levels_info <- NA_character_
} else {
min_val <- NA_real_
max_val <- NA_real_
if (is.factor(x)) {
lv <- levels(x)
} else {
lv <- unique(x)
lv <- lv[!is.na(lv)]
lv <- as.character(lv)
}
lv <- head(lv, 20L)
levels_info <- if (length(lv) == 0) NA_character_ else paste(lv, collapse = "; ")
}
label <- if (!is.null(var_labels) && !is.null(var_labels[[v]])) {
var_labels[[v]]
} else {
v
}
tibble(
variable = v,
label = label,
type = type,
n = n,
n_missing = n_missing,
pct_missing = pct_missing,
min = min_val,
max = max_val,
levels = levels_info
)
})
}
## Specify variables to include in data dictionary table
vars_for_dict <- c(
# Design / ID
"seqn","sddsrvyr","cycle_str","sdmvpsu","sdmvstra",
"wtint2yr","wtmec2yr","wtint8yr","wtmec8yr",
# Demographics
"age","sex","race","educ","pir",
# Behaviours / psychosocial
"smoking","phys_act","phys_act_trend",
"food_sec",
"insur","alcohol",
# Anthropometrics & cardio-metabolic
"bmi","whtr","sbp_mean","dbp_mean",
"tchol","hdl","non_hdl","hba1c",
"bmi_cat","bmi_cat_trend",
"whtr_cat","whtr_cat_trend",
"sbp_cat","sbp_cat_trend",
"dbp_cat","dbp_cat_trend",
# Vitamin D / DII
"vitD25oh","vitD25oh_cat","vitD25oh_cat_trend",
"dii","dii_quart","dii_quart_trend",
"dii_etoh",
# Nutrient quartiles
"kcal_quart","carb_quart","protein_quart","totalfat_quart","satfat_quart",
"pufa_quart","mufa_quart","n3fat_quart","n6fat_quart","choles_quart",
"vitA_quart","bcarotene_quart","vitC_quart","vitE_quart",
"vitB6_quart","vitB12_quart","folicacid_quart","thiamin_quart",
"riboflavin_quart","niacin_quart","iron_quart","mg_quart","zn_quart","se_quart",
"fiber_quart","caffeine_quart",
# Outcomes and related variables
"asthma","ar","arthritis","diabetes","htn","ad","fa",
"fh_asthma"
)
## Create variable summary for table
var_summary <- make_var_summary(
data = nhanes_analytic,
vars = vars_for_dict,
var_labels = var_labels
)
## Define grouping of variables for Table 1
table1_group_map <- c(
seqn = "Demographics",
sddsrvyr = "Demographics",
cycle_str = "Demographics",
sdmvpsu = "Demographics",
sdmvstra = "Demographics",
wtint2yr = "Demographics",
wtmec2yr = "Demographics",
wtint8yr = "Demographics",
wtmec8yr = "Demographics",
age = "Demographics",
sex = "Demographics",
race = "Demographics",
educ = "Demographics",
pir = "Demographics",
smoking = "Demographics",
phys_act = "Demographics",
phys_act_trend = "Demographics",
food_sec = "Demographics",
insur = "Demographics",
alcohol = "Demographics",
fh_asthma = "Demographics",
bmi = "Anthropometrics",
whtr = "Anthropometrics",
sbp_mean = "Anthropometrics",
dbp_mean = "Anthropometrics",
tchol = "Anthropometrics",
hdl = "Anthropometrics",
non_hdl = "Anthropometrics",
hba1c = "Anthropometrics",
bmi_cat = "Anthropometrics",
bmi_cat_trend = "Anthropometrics",
whtr_cat = "Anthropometrics",
whtr_cat_trend = "Anthropometrics",
sbp_cat = "Anthropometrics",
sbp_cat_trend = "Anthropometrics",
dbp_cat = "Anthropometrics",
dbp_cat_trend = "Anthropometrics",
vitD25oh = "Nutrients",
vitD25oh_cat = "Nutrients",
vitD25oh_cat_trend = "Nutrients",
dii = "Nutrients",
dii_quart = "Nutrients",
dii_quart_trend = "Nutrients",
dii_etoh = "Nutrients",
kcal_quart = "Nutrients",
carb_quart = "Nutrients",
protein_quart = "Nutrients",
totalfat_quart = "Nutrients",
satfat_quart = "Nutrients",
pufa_quart = "Nutrients",
mufa_quart = "Nutrients",
n3fat_quart = "Nutrients",
n6fat_quart = "Nutrients",
choles_quart = "Nutrients",
vitA_quart = "Nutrients",
bcarotene_quart = "Nutrients",
vitC_quart = "Nutrients",
vitE_quart = "Nutrients",
vitB6_quart = "Nutrients",
vitB12_quart = "Nutrients",
folicacid_quart = "Nutrients",
thiamin_quart = "Nutrients",
riboflavin_quart = "Nutrients",
niacin_quart = "Nutrients",
iron_quart = "Nutrients",
mg_quart = "Nutrients",
zn_quart = "Nutrients",
se_quart = "Nutrients",
fiber_quart = "Nutrients",
caffeine_quart = "Nutrients",
asthma = "Disease Outcomes",
ar = "Disease Outcomes",
arthritis = "Disease Outcomes",
diabetes = "Disease Outcomes",
htn = "Disease Outcomes",
ad = "Disease Outcomes",
fa = "Disease Outcomes"
)
## Define helper function to identify binary numeric variables
identify_binary <- function(x) {
vals <- unique(na.omit(x))
length(vals) <= 2 && all(vals %in% c(0, 1))
}
## Prepare Table 1
table1_df <- var_summary |>
mutate(
# Binary 0/1 variable handling
binary_flag = map_lgl(variable, ~ identify_binary(nhanes_analytic[[.x]])),
Type = case_when(
binary_flag ~ "binary",
type == "numeric" ~ "numeric",
TRUE ~ type
),
N = as.integer(n),
N_missing = as.integer(n_missing),
group = table1_group_map[variable]
) |>
select(
group,
Variable = variable,
Content = label,
Type,
N,
N_missing,
pct_missing,
Min = min,
Max = max,
Levels = levels,
binary_flag
) |>
mutate(
Min = if_else(binary_flag, 0, Min),
Max = if_else(binary_flag, 1, Max),
Levels = if_else(binary_flag, "0=No; 1=Yes", Levels),
# Fine-tune levels of certain variables
Levels = case_when(
Variable == "sddsrvyr" ~ "4=2005–06; 5=2007–08; 6=2009–10; 7=2011–12",
Variable == "bmi_cat" ~ "Underweight (<18.5); Normal (18.5–24.9); Overweight (25–29.9); Obesity I (30–34.9); Obesity II (35–39.9); Obesity III (≥40)",
Variable == "vitD25oh_cat" ~ "<20 (deficient); 20–29 (insufficient); ≥30 (sufficient)",
Variable == "educ" ~ "<9 years; 9–12 years; >12 years",
TRUE ~ Levels
)
) |>
arrange(
factor(
group,
levels = c("Demographics", "Anthropometrics", "Nutrients", "Disease Outcomes")
),
Variable
) |>
select(
group,
Variable,
Content,
Type,
N,
`N missing` = N_missing,
`% missing` = pct_missing,
Min,
Max,
Levels
)
## Generate Table 1
labs_vars <- c("bmi","whtr","tchol","hdl","non_hdl","hba1c","vitD25oh","dii","dii_etoh")
nhanes_table1_gt <- table1_df |>
gt(groupname_col = "group") |>
tab_header(
title = md("**Table 1: Extracted Variables: Types, Distributions, and Missingness**")
) |>
cols_label(
N = md("N"),
`N missing` = md("N missing"),
`% missing` = md("% missing")
) |>
# % missing always with 3 decimals
fmt_number(
columns = `% missing`,
decimals = 3
) |>
# Min/Max for selected continuous variables: 1 decimal
fmt_number(
columns = c(Min, Max),
decimals = 1,
rows = Variable %in% labs_vars & Type != "binary"
) |>
# Min/Max for all other non-binary numeric variables: 0 decimals
fmt_number(
columns = c(Min, Max),
decimals = 0,
rows = !(Variable %in% labs_vars) & Type != "binary"
) |>
# Min/Max for binary 0/1 variables: 0 decimals (0 and 1)
fmt_number(
columns = c(Min, Max),
decimals = 0,
rows = Type == "binary"
) |>
# For seqn, suppress thousand separators
fmt_number(
columns = c(Min, Max),
decimals = 0,
use_seps = FALSE,
rows = Variable == "seqn"
) |>
cols_align(align = "left") |>
tab_options(table.font.size = px(12)) |>
# Bold column headings and row group labels
tab_style(
style = list(
cell_text(weight = "bold")
),
locations = list(
cells_column_labels(everything()),
cells_row_groups()
)
)
# Additional adjustments for regression analyses-----------------------------------------------
## Add trend variables to outcome-specific survey designs
add_trend_vars_to_design <- function(design) {
vars <- design$variables
if ("vitD25oh_cat" %in% names(vars) && !"vitD25oh_cat_trend" %in% names(vars)) {
vars <- vars |>
mutate(
vitD25oh_cat_trend = if_else(
!is.na(vitD25oh_cat),
as.numeric(vitD25oh_cat) - 1,
NA_real_
)
)
}
if ("dii_quart" %in% names(vars) && !"dii_quart_trend" %in% names(vars)) {
vars <- vars |>
mutate(
dii_quart_trend = if_else(
!is.na(dii_quart),
as.numeric(dii_quart) - 1,
NA_real_
)
)
}
design$variables <- vars
design
}
des_asthma_reg <- add_trend_vars_to_design(des_asthma_reg)
des_ar_reg <- add_trend_vars_to_design(des_ar_reg)
des_inflam_reg <- add_trend_vars_to_design(des_inflam_reg)
des_htn_reg <- add_trend_vars_to_design(des_htn_reg)
des_dm_reg <- add_trend_vars_to_design(des_dm_reg)
des_ad_reg <- add_trend_vars_to_design(des_ad_reg)
des_fa_reg <- add_trend_vars_to_design(des_fa_reg)
## Full cohort design for characteristics and prevalence
make_design_wt8yr <- function(data) {
svydesign(
ids = ~sdmvpsu,
strata = ~sdmvstra,
weights = ~wtmec8yr,
nest = TRUE,
data = data
) |>
subset(wtmec8yr > 0)
}
des_full <- make_design_wt8yr(nhanes_analytic)
## Outcome lists
outcome_list <- list(
asthma = list(
var = "asthma",
design = des_asthma_reg,
label = "Asthma"
),
ar = list(
var = "ar",
design = des_ar_reg,
label = "Allergic rhinitis"
),
inflam = list(
var = "arthritis",
design = des_inflam_reg,
label = "Inflammatory arthritis"
),
htn = list(
var = "htn",
design = des_htn_reg,
label = "Hypertension"
),
dm = list(
var = "diabetes",
design = des_dm_reg,
label = "Diabetes"
)
)
outcome_list_secondary <- list(
ad = list(
var = "ad",
design = des_ad_reg,
label = "Atopic dermatitis"
),
fa = list(
var = "fa",
design = des_fa_reg,
label = "Food allergy"
)
)
## Labels for core covariates and DII-related predictors
uni1_labels <- c(
age = "Age (years)",
sex = "Sex",
race = "Race/ethnicity",
insur = "Insurance",
educ = "Education",
pir = "Poverty-income ratio",
food_sec = "Food security",
smoking = "Smoking status",
phys_act = "Physical activity",
alcohol = "Alcohol use",
sbp_cat = "Systolic blood pressure category",
dbp_cat = "Diastolic blood pressure category",
bmi_cat = "BMI category",
whtr_cat = "Waist-to-height ratio category",
hdl = "HDL cholesterol (mg/dL)",
non_hdl = "Non-HDL cholesterol (mg/dL)",
tchol = "Total cholesterol (mg/dL)",
hba1c = "HbA1c (%)",
vitD25oh_cat = "25(OH)D category",
vitD25oh_cat_trend = "25(OH)D category trend (0–2)",
dii = "Dietary Inflammatory Index (per unit)",
dii_quart_trend = "DII quartile trend (0–3)",
dii_etoh = "DII including alcohol (per unit)"
)
## Mapping of factor levels to labels for characteristics and univariable models
custom_order_uni1 <- tribble(
~predictor, ~level, ~variable_label, ~rank,
# Sex
"sex", "Male", "Sex: Male", 1,
"sex", "Female", "Sex: Female", 2,
# Race (matches nhanes_analytic$race levels)
"race", "Non-Hispanic White", "Race/ethnicity: Non-Hispanic White", 1,
"race", "Non-Hispanic Black", "Race/ethnicity: Non-Hispanic Black", 2,
"race", "Mexican American", "Race/ethnicity: Mexican American", 3,
"race", "Other Hispanic", "Race/ethnicity: Other Hispanic", 4,
"race", "Other/multiracial", "Race/ethnicity: Other race", 5,
# Insurance
"insur", "No", "Insurance: No", 1,
"insur", "Yes", "Insurance: Yes", 2,
# Education (matches nhanes_analytic$educ levels)
"educ", "<9 years", "Education: <9", 1,
"educ", "9–12 years", "Education: 9-12", 2,
"educ", ">12 years", "Education: >12", 3,
# Food security
"food_sec", "Full", "Food security: Full", 1,
"food_sec", "Marginal", "Food security: Marginal", 2,
"food_sec", "Low", "Food security: Low", 3,
"food_sec", "Very low", "Food security: Very low", 4,
# Smoking
"smoking", "Never", "Smoking status: Never", 1,
"smoking", "Current", "Smoking status: Current", 2,
"smoking", "Former", "Smoking status: Former", 3,
# Alcohol
"alcohol", "No", "Alcohol use: No", 1,
"alcohol", "Yes", "Alcohol use: Yes", 2,
# Physical activity
"phys_act", "Sedentary", "Physical activity: Sedentary", 1,
"phys_act", "Moderate", "Physical activity: Moderate", 2,
"phys_act", "Intense", "Physical activity: Intense", 3,
# SBP categories (match levels in nhanes_analytic)
"sbp_cat", "<120", "Systolic blood pressure category: <120", 1,
"sbp_cat", "120–129", "Systolic blood pressure category: 120-129", 2,
"sbp_cat", "130–139", "Systolic blood pressure category: 130-139", 3,
"sbp_cat", "140–159", "Systolic blood pressure category: 140-159", 4,
"sbp_cat", "≥160", "Systolic blood pressure category: >=160", 5,
# DBP categories
"dbp_cat", "<80", "Diastolic blood pressure category: <80", 1,
"dbp_cat", "80–89", "Diastolic blood pressure category: 80-89", 2,
"dbp_cat", "90–99", "Diastolic blood pressure category: 90-99", 3,
"dbp_cat", "≥100", "Diastolic blood pressure category: >=100", 4,
# BMI categories
"bmi_cat", "Underweight (<18.5)", "BMI category: Underweight", 1,
"bmi_cat", "Normal (18.5–24.9)", "BMI category: Normal weight", 2,
"bmi_cat", "Overweight (25–29.9)", "BMI category: Overweight", 3,
"bmi_cat", "Obesity I (30–34.9)", "BMI category: Obesity I", 4,
"bmi_cat", "Obesity II (35–39.9)", "BMI category: Obesity II", 5,
"bmi_cat", "Obesity III (≥40)", "BMI category: Obesity III", 6,
# WHtR categories
"whtr_cat", "<0.5", "Waist-to-height ratio category: <0.5", 1,
"whtr_cat", "0.5–0.59", "Waist-to-height ratio category: 0.5-0.59", 2,
"whtr_cat", "≥0.6", "Waist-to-height ratio category: >=0.6", 3,
# Vitamin D categories
"vitD25oh_cat", "<20 (deficient)", "25(OH)D category: <20", 1,
"vitD25oh_cat", "20–29 (insufficient)", "25(OH)D category: 20-29", 2,
"vitD25oh_cat", "≥30 (sufficient)", "25(OH)D category: >=30", 3
)
is_poly_term <- function(terms, predictor) {
any(grepl(paste0("^", predictor, "\\."), terms))
}
add_ref_and_label <- function(td, des, predictor, outcome_name, outcome_info, base_label = NULL) {
if (is.null(base_label)) {
base_label <- predictor
}
var_data <- des$variables[[predictor]]
## Numeric predictors have no reference row, only single row with label
if (is.numeric(var_data) || is.integer(var_data)) {
td <- td |>
mutate(variable_label = base_label)
return(td)
}
## Character predictors converted to factor
if (is.character(var_data)) {
var_data <- factor(var_data)
}
## Detect polynomial terms and assign linear or quadratic trend
if (is_poly_term(td$term, predictor)) {
td <- td |>
mutate(
variable_label = case_when(
grepl("\\.L$", term) ~ paste0(base_label, ": linear trend"),
grepl("\\.Q$", term) ~ paste0(base_label, ": quadratic trend"),
TRUE ~ paste0(base_label, ": ", term)
)
)
return(td)
}
## Add reference row and level labels for factor predictors
if (is.factor(var_data)) {
levels_vec <- levels(var_data)
pattern <- paste0("^", predictor)
td_nonref <- td |>
mutate(
level = sub(pattern, "", term),
variable_label = paste0(base_label, ": ", level)
)
ref_level <- levels_vec[1]
ref_label <- paste0(base_label, ": ", ref_level)
ref_row <- tibble(
term = paste0(predictor, ref_level),
estimate = 1,
conf.low = 1,
conf.high = 1,
p.value = NA_real_,
outcome = outcome_name,
outcome_lab = outcome_info$label,
predictor = predictor,
level = ref_level,
variable_label = ref_label
)
td_full <- bind_rows(ref_row, td_nonref)
return(td_full)
}
td |>
mutate(variable_label = base_label)
}
# Print Table 1 variable summary---------------------------------------------------------------
nhanes_table1_gt| Table 1: Extracted Variables: Types, Distributions, and Missingness | ||||||||
|---|---|---|---|---|---|---|---|---|
| Variable | Content | Type | N | N missing | Min | Max | Levels | |
| Demographics | ||||||||
| age | Age in years | numeric | 7803 | 0 | 0.000 | 20 | 40 | NA |
| alcohol | Alcohol use | factor | 7803 | 1192 | 0.153 | NA | NA | No; Yes |
| cycle_str | Survey cycle label | character | 7803 | 0 | 0.000 | NA | NA | 2005-2006; 2007-2008; 2009-2010; 2011-2012 |
| educ | Education level | factor | 7803 | 7 | 0.001 | NA | NA | <9 years; 9–12 years; >12 years |
| fh_asthma | Family history of asthma | binary | 7803 | 131 | 0.017 | 0 | 1 | 0=No; 1=Yes |
| food_sec | Household food security status | factor | 7803 | 1539 | 0.197 | NA | NA | Full; Marginal; Low; Very low |
| insur | Any health insurance coverage | factor | 7803 | 7 | 0.001 | NA | NA | No; Yes |
| phys_act | Recreational physical activity level | factor | 7803 | 21 | 0.003 | NA | NA | Sedentary; Moderate; Intense |
| phys_act_trend | Physical activity trend score | numeric | 7803 | 21 | 0.003 | 0 | 2 | NA |
| pir | Poverty-income ratio | numeric | 7803 | 565 | 0.072 | 0 | 5 | NA |
| race | Race/ethnicity | factor | 7803 | 742 | 0.095 | NA | NA | Non-Hispanic White; Non-Hispanic Black; Mexican American; Other Hispanic; Other/multiracial |
| sddsrvyr | Survey cycle | numeric | 7803 | 0 | 0.000 | 4 | 7 | 4=2005–06; 5=2007–08; 6=2009–10; 7=2011–12 |
| sdmvpsu | Masked variance pseudo-PSU | numeric | 7803 | 0 | 0.000 | 1 | 3 | NA |
| sdmvstra | Masked variance pseudo-stratum | numeric | 7803 | 0 | 0.000 | 44 | 103 | NA |
| seqn | Participant ID | numeric | 7803 | 0 | 0.000 | 31144 | 71912 | NA |
| sex | Sex | factor | 7803 | 0 | 0.000 | NA | NA | Male; Female |
| smoking | Smoking status | factor | 7803 | 7 | 0.001 | NA | NA | Never; Former; Current |
| wtint2yr | Interview weight (2-year) | numeric | 7803 | 0 | 0.000 | 1,339 | 152,859 | NA |
| wtint8yr | Interview weight (8-year pooled) | numeric | 7803 | 0 | 0.000 | 335 | 38,215 | NA |
| wtmec2yr | MEC exam weight (2-year) | numeric | 7803 | 0 | 0.000 | 0 | 163,393 | NA |
| wtmec8yr | MEC exam weight (8-year pooled) | numeric | 7803 | 0 | 0.000 | 0 | 40,848 | NA |
| Anthropometrics | ||||||||
| bmi | Body mass index (kg/m^2) | numeric | 7803 | 347 | 0.044 | 14.7 | 76.1 | NA |
| bmi_cat | BMI category | factor | 7803 | 347 | 0.044 | NA | NA | Underweight (<18.5); Normal (18.5–24.9); Overweight (25–29.9); Obesity I (30–34.9); Obesity II (35–39.9); Obesity III (≥40) |
| bmi_cat_trend | BMI category trend score | numeric | 7803 | 347 | 0.044 | 0 | 5 | NA |
| dbp_cat | Diastolic blood pressure category | factor | 7803 | 715 | 0.092 | NA | NA | <80; 80–89; 90–99; ≥100 |
| dbp_cat_trend | DBP category trend score | numeric | 7803 | 715 | 0.092 | 0 | 3 | NA |
| dbp_mean | Mean diastolic blood pressure (mmHg) | numeric | 7803 | 715 | 0.092 | 3 | 125 | NA |
| hba1c | Hemoglobin A1c (%) | numeric | 7803 | 770 | 0.099 | 3.6 | 15.9 | NA |
| hdl | HDL cholesterol (mg/dL) | numeric | 7803 | 811 | 0.104 | 7.0 | 149.0 | NA |
| non_hdl | Non-HDL cholesterol (mg/dL) | numeric | 7803 | 811 | 0.104 | 27.0 | 494.0 | NA |
| sbp_cat | Systolic blood pressure category | factor | 7803 | 708 | 0.091 | NA | NA | <120; 120–129; 130–139; 140–159; ≥160 |
| sbp_cat_trend | SBP category trend score | numeric | 7803 | 708 | 0.091 | 0 | 4 | NA |
| sbp_mean | Mean systolic blood pressure (mmHg) | numeric | 7803 | 708 | 0.091 | 80 | 196 | NA |
| tchol | Total cholesterol (mg/dL) | numeric | 7803 | 811 | 0.104 | 59.0 | 523.0 | NA |
| whtr | Waist-to-height ratio | numeric | 7803 | 633 | 0.081 | 0.4 | 1.1 | NA |
| whtr_cat | Waist-to-height ratio category | factor | 7803 | 633 | 0.081 | NA | NA | <0.5; 0.5–0.59; ≥0.6 |
| whtr_cat_trend | WHtR category trend score | numeric | 7803 | 633 | 0.081 | 0 | 2 | NA |
| Nutrients | ||||||||
| bcarotene_quart | Beta-carotene intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| caffeine_quart | Caffeine intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| carb_quart | Carbohydrate intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| choles_quart | Cholesterol intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| dii | Dietary Inflammatory Index (no alcohol) | numeric | 7803 | 745 | 0.095 | −5.7 | 8.1 | NA |
| dii_etoh | Dietary Inflammatory Index including alcohol | numeric | 7803 | 745 | 0.095 | −5.7 | 8.3 | NA |
| dii_quart | DII quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1 (most anti-inflammatory); Q2; Q3; Q4 (most pro-inflammatory) |
| dii_quart_trend | DII quartile trend score | numeric | 7803 | 745 | 0.095 | 0 | 3 | NA |
| fiber_quart | Fiber intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| folicacid_quart | Folic acid intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| iron_quart | Iron intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| kcal_quart | Total energy intake (kcal) quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| mg_quart | Magnesium intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| mufa_quart | Monounsaturated fat intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| n3fat_quart | Omega-3 fat intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| n6fat_quart | Omega-6 fat intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| niacin_quart | Niacin intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| protein_quart | Protein intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| pufa_quart | Polyunsaturated fat intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| riboflavin_quart | Riboflavin intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| satfat_quart | Saturated fat intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| se_quart | Selenium intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| thiamin_quart | Thiamin intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| totalfat_quart | Total fat intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| vitA_quart | Vitamin A intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| vitB12_quart | Vitamin B12 intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| vitB6_quart | Vitamin B6 intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| vitC_quart | Vitamin C intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| vitD25oh | Serum Vitamin D [25(OH)D] concentration (nmol/L) | numeric | 7803 | 1054 | 0.135 | 6.2 | 213.0 | NA |
| vitD25oh_cat | Vitamin D status category | factor | 7803 | 1054 | 0.135 | NA | NA | <20 (deficient); 20–29 (insufficient); ≥30 (sufficient) |
| vitD25oh_cat_trend | Vitamin D status trend score | numeric | 7803 | 1054 | 0.135 | 0 | 2 | NA |
| vitE_quart | Vitamin E intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| zn_quart | Zinc intake quartile | factor | 7803 | 745 | 0.095 | NA | NA | Q1; Q2; Q3; Q4 |
| Disease Outcomes | ||||||||
| ad | Reported eczema (atopic dermatitis; 2005–2006 only) | binary | 7803 | 6108 | 0.783 | 0 | 1 | 0=No; 1=Yes |
| ar | Reported allergic rhinitis (hay fever) | binary | 7803 | 20 | 0.003 | 0 | 1 | 0=No; 1=Yes |
| arthritis | Reported rheumatoid or psoriatic arthritis | binary | 7803 | 188 | 0.024 | 0 | 1 | 0=No; 1=Yes |
| asthma | Reported asthma | binary | 7803 | 6 | 0.001 | 0 | 1 | 0=No; 1=Yes |
| diabetes | Reported diabetes | binary | 7803 | 7 | 0.001 | 0 | 1 | 0=No; 1=Yes |
| fa | Reported food allergy | binary | 7803 | 3713 | 0.476 | 0 | 1 | 0=No; 1=Yes |
| htn | Reported hypertension | binary | 7803 | 14 | 0.002 | 0 | 1 | 0=No; 1=Yes |
4 Results
Cohort characteristics
Cohort characteristics are shown in Table 2 for the full cohort (2005–2012) and in Table 3 for the time-restricted cohorts for eczema (2005–2006) and food allergy (2007–2010). In general, participants with asthma, allergic rhinitis, eczema, and food allergy were similar to the respective full cohort populations, with only modest differences (e.g., higher proportions of males and small variations in physical activity and smoking patterns). Patients with allergic rhinitis, inflammatory arthritis, diabetes, and hypertension were slightly older than the full cohort. Those with inflammatory arthritis, diabetes, and hypertension also exhibited higher DII scores, while participants with allergic rhinitis had slightly lower DII scores relative to the full cohort. There were smaller differences in DII scores in the eczema and food allergy cohorts relative to their respective time-restricted full cohorts.
# Continuous covariates and categorical levels ----------------------------------------------
## Continuous characteristics to summarize as mean (SE)
char_continuous <- c("age", "pir", "hdl", "non_hdl", "tchol", "hba1c", "dii", "dii_etoh")
## Predictor order
char_order <- c(
"age",
"sex",
"race",
"insur",
"educ",
"pir",
"food_sec",
"smoking",
"phys_act",
"alcohol",
"sbp_cat",
"dbp_cat",
"bmi_cat",
"whtr_cat",
"hdl",
"non_hdl",
"tchol",
"hba1c",
"vitD25oh_cat",
"dii",
"dii_etoh"
)
## Characteristics to include
char_skeleton_raw <- bind_rows(
tibble(
predictor = char_continuous,
level = NA_character_,
variable_label = uni1_labels[char_continuous],
type = "continuous",
rank = NA_integer_
),
custom_order_uni1 |>
mutate(type = "categorical")
)
## Characteristic ordering
char_skeleton <- char_skeleton_raw |>
mutate(
predictor = factor(
predictor,
levels = c(char_order, setdiff(unique(predictor), char_order))
)
) |>
arrange(predictor, rank) |>
mutate(
predictor = as.character(predictor)
)
## Helper function to compute characteristics for a single survey design
compute_char_for_group <- function(design, group_name) {
vars <- design$variables
map_dfr(
seq_len(nrow(char_skeleton)),
function(i) {
row <- char_skeleton[i, ]
predictor <- row$predictor
label <- row$variable_label
type <- row$type
lvl <- row$level
# If predictor not present, return NA
if (!predictor %in% names(vars)) {
return(tibble(
Group = group_name,
Characteristic = label,
Value = NA_character_
))
}
if (type == "continuous") {
# Continuous: survey-weighted mean (SE)
f <- as.formula(paste0("~", predictor))
est <- svymean(f, design, na.rm = TRUE)
m <- as.numeric(est)[1]
se <- as.numeric(SE(est))[1]
value <- sprintf("%.2f (%.2f)", m, se)
} else {
# Categorical: survey-weighted prevalence (SE) for each specified level
var <- vars[[predictor]]
# Coerce to factor and drop unused levels
if (!is.factor(var)) {
var <- factor(var)
} else {
var <- droplevels(var)
}
# If the target level is missing or not in the factor levels, return NA
if (is.na(lvl) || !lvl %in% levels(var)) {
return(tibble(
Group = group_name,
Characteristic = label,
Value = NA_character_
))
}
# Restrict to non-missing values for this predictor
nonmiss <- !is.na(var)
if (!any(nonmiss)) {
return(tibble(
Group = group_name,
Characteristic = label,
Value = NA_character_
))
}
# Subset design to non-missing rows for predictor
design_nm <- subset(design, nonmiss)
# Recompute vars for subsetted design and attach clean indicator
vars_nm <- design_nm$variables
vars_nm$indicator <- as.numeric(var[nonmiss] == lvl)
design_nm$variables <- vars_nm
est <- tryCatch(
svymean(~indicator, design_nm, na.rm = TRUE),
error = function(e) NA
)
if (all(is.na(est))) {
value <- NA_character_
} else {
p <- as.numeric(est)[1] * 100
se <- as.numeric(SE(est))[1] * 100
value <- sprintf("%.1f%% (%.1f)", p, se)
}
}
tibble(
Group = group_name,
Characteristic = label,
Value = value
)
}
)
}
# Full cohort and primary outcomes -----------------------------------------------------------
des_full_primary <- des_full
des_asthma_cases <- subset(des_asthma_reg, asthma == 1)
des_ar_cases <- subset(des_ar_reg, ar == 1)
des_inflam_cases <- subset(des_inflam_reg, arthritis == 1)
des_dm_cases <- subset(des_dm_reg, diabetes == 1)
des_htn_cases <- subset(des_htn_reg, htn == 1)
group_designs_primary <- list(
"Full cohort" = des_full_primary,
"Asthma" = des_asthma_cases,
"Allergic rhinitis" = des_ar_cases,
"Inflammatory arthritis" = des_inflam_cases,
"Diabetes" = des_dm_cases,
"Hypertension" = des_htn_cases
)
char_primary_long <- imap_dfr(
group_designs_primary,
~ compute_char_for_group(design = .x, group_name = .y)
)
## Rename characteristics for DII and Vitamin D
char_primary_wide <- char_primary_long |>
mutate(
Characteristic = dplyr::recode(
Characteristic,
"Dietary Inflammatory Index (per unit)" = "Dietary Inflammatory Index (DII)",
"DII including alcohol (per unit)" = "DII including alcohol"
),
Characteristic = gsub(
"25\\(OH\\)D category",
"Vitamin D category",
Characteristic
)
) |>
pivot_wider(
id_cols = Characteristic,
names_from = Group,
values_from = Value
)
tbl_char_primary <- char_primary_wide |>
gt() |>
tab_header(
title = md(
"**Table 2: Characteristics of Full Cohort (2005–2012) and Participants with Asthma, Allergic Rhinitis, Inflammatory Arthritis, Diabetes, and Hypertension**"
)
) |>
fmt_markdown(columns = everything()) |>
tab_options(
table.font.size = px(10),
column_labels.font.weight = "bold"
) |>
tab_footnote(
footnote = "For each characteristic, survey-weighted means (SE) are shown for continuous variables, and percentages of total within that category are shown for categorical variables.",
locations = cells_title(groups = "title")
)
# Time-restricted eczema and food allergy cohorts -------------------------------------------
ad_cohort <- nhanes_analytic |>
filter(sddsrvyr == 4, !is.na(ad))
fa_cohort <- nhanes_analytic |>
filter(sddsrvyr %in% c(5, 6), !is.na(fa))
des_ad_cohort <- make_design_wt8yr(ad_cohort)
des_fa_cohort <- make_design_wt8yr(fa_cohort)
des_ad_cases <- subset(des_ad_reg, ad == 1)
des_fa_cases <- subset(des_fa_reg, fa == 1)
group_designs_secondary <- list(
"Cohort (2005–2006)" = des_ad_cohort,
"Eczema" = des_ad_cases,
"Cohort (2007–2010)" = des_fa_cohort,
"Food allergy" = des_fa_cases
)
char_secondary_long <- imap_dfr(
group_designs_secondary,
~ compute_char_for_group(design = .x, group_name = .y)
)
## Rename characteristics for DII and Vitamin D
char_secondary_wide <- char_secondary_long |>
mutate(
Characteristic = dplyr::recode(
Characteristic,
"Dietary Inflammatory Index (per unit)" = "Dietary Inflammatory Index (DII)",
"DII including alcohol (per unit)" = "DII including alcohol"
),
Characteristic = gsub(
"25\\(OH\\)D category",
"Vitamin D category",
Characteristic
)
) |>
pivot_wider(
id_cols = Characteristic,
names_from = Group,
values_from = Value
)
tbl_char_secondary <- char_secondary_wide |>
gt() |>
tab_header(
title = md(
"**Table 3: Characteristics of Time-Restricted Subcohorts and Participants with Eczema (2005–2006) and Food Allergy (2007–2010)**"
)
) |>
fmt_markdown(columns = everything()) |>
tab_options(
table.font.size = px(10),
column_labels.font.weight = "bold"
) |>
tab_footnote(
footnote = "For each characteristic, survey-weighted means (SE) are shown for continuous variables, and percentages of total within that category are shown for categorical variables.",
locations = cells_title(groups = "title")
)
# Print cohort characteristic tables --------------------------------------------------------
tbl_char_primary| Table 2: Characteristics of Full Cohort (2005–2012) and Participants with Asthma, Allergic Rhinitis, Inflammatory Arthritis, Diabetes, and Hypertension1 | ||||||
|---|---|---|---|---|---|---|
| Characteristic | Full cohort | Asthma | Allergic rhinitis | Inflammatory arthritis | Diabetes | Hypertension |
| Age (years) | 30.03 (0.15) | 29.18 (0.26) | 32.09 (0.22) | 33.26 (0.71) | 33.55 (0.44) | 32.77 (0.24) |
| Sex: Male | 51.3% (0.6) | 45.1% (1.7) | 44.2% (1.6) | 31.7% (4.3) | 45.9% (4.8) | 55.4% (1.8) |
| Sex: Female | 48.7% (0.6) | 54.9% (1.7) | 55.8% (1.6) | 68.3% (4.3) | 54.1% (4.8) | 44.6% (1.8) |
| Race/ethnicity: Non-Hispanic White | 65.9% (2.0) | 71.6% (2.3) | 77.7% (2.0) | 74.2% (4.6) | 55.9% (4.7) | 63.2% (2.9) |
| Race/ethnicity: Non-Hispanic Black | 13.8% (1.0) | 15.4% (1.5) | 10.7% (1.1) | 14.2% (3.6) | 22.5% (3.3) | 20.7% (2.0) |
| Race/ethnicity: Mexican American | 13.1% (1.2) | 6.6% (1.0) | 5.9% (0.9) | 7.6% (2.2) | 11.6% (2.2) | 10.3% (1.3) |
| Race/ethnicity: Other Hispanic | 7.3% (0.9) | 6.3% (1.1) | 5.7% (1.1) | 4.0% (1.7) | 10.1% (2.5) | 5.8% (1.1) |
| Race/ethnicity: Other race | NA | NA | NA | NA | NA | NA |
| Insurance: No | 30.5% (1.0) | 26.7% (1.6) | 23.9% (1.9) | 16.7% (3.8) | 24.0% (3.7) | 25.6% (1.6) |
| Insurance: Yes | 69.5% (1.0) | 73.3% (1.6) | 76.1% (1.9) | 83.3% (3.8) | 76.0% (3.7) | 74.4% (1.6) |
| Education: <9 | 4.6% (0.4) | 1.9% (0.5) | 2.5% (0.6) | 7.2% (3.2) | 6.0% (2.1) | 3.4% (0.8) |
| Education: 9-12 | 34.4% (1.3) | 34.5% (2.0) | 26.4% (2.0) | 38.6% (5.5) | 39.1% (4.4) | 37.1% (1.9) |
| Education: >12 | 61.0% (1.4) | 63.6% (2.2) | 71.1% (2.2) | 54.3% (7.1) | 54.9% (4.8) | 59.5% (2.2) |
| Poverty-income ratio | 2.70 (0.05) | 2.59 (0.09) | 2.96 (0.09) | 2.24 (0.17) | 2.34 (0.13) | 2.58 (0.07) |
| Food security: Full | 83.5% (0.8) | 80.5% (1.6) | 87.5% (1.3) | 74.2% (4.1) | 76.1% (4.5) | 80.5% (1.8) |
| Food security: Marginal | 12.6% (0.7) | 14.5% (1.6) | 9.8% (1.0) | 19.7% (4.3) | 18.1% (4.3) | 15.7% (1.6) |
| Food security: Low | 2.6% (0.2) | 2.3% (0.4) | 1.8% (0.6) | 2.5% (1.5) | 2.4% (1.3) | 2.5% (0.5) |
| Food security: Very low | 1.3% (0.2) | 2.6% (0.4) | 0.9% (0.2) | 3.6% (2.1) | 3.4% (1.9) | 1.3% (0.5) |
| Smoking status: Never | 58.8% (1.1) | 53.1% (2.2) | 57.5% (2.4) | 39.8% (5.3) | 59.1% (4.2) | 49.1% (2.0) |
| Smoking status: Current | 26.9% (0.9) | 33.6% (2.0) | 25.4% (1.9) | 46.6% (6.0) | 28.1% (4.0) | 34.0% (2.1) |
| Smoking status: Former | 14.3% (0.6) | 13.4% (1.3) | 17.1% (1.8) | 13.6% (4.1) | 12.8% (3.6) | 16.9% (1.5) |
| Physical activity: Sedentary | 34.5% (1.0) | 32.2% (1.5) | 29.3% (1.7) | 37.4% (5.1) | 44.9% (3.8) | 38.0% (2.1) |
| Physical activity: Moderate | 24.4% (0.7) | 25.3% (1.6) | 28.8% (1.8) | 38.8% (5.3) | 32.9% (4.7) | 27.5% (1.6) |
| Physical activity: Intense | 41.1% (1.2) | 42.4% (2.1) | 41.9% (2.2) | 23.8% (4.9) | 22.2% (3.8) | 34.5% (2.0) |
| Alcohol use: No | 19.0% (0.7) | 16.2% (1.4) | 16.5% (1.5) | 25.1% (4.3) | 26.9% (3.7) | 17.4% (1.3) |
| Alcohol use: Yes | 81.0% (0.7) | 83.8% (1.4) | 83.5% (1.5) | 74.9% (4.3) | 73.1% (3.7) | 82.6% (1.3) |
| Systolic blood pressure category: <120 | 68.5% (0.9) | 67.4% (1.5) | 70.6% (1.8) | 74.2% (5.9) | 42.6% (5.3) | 38.5% (2.6) |
| Systolic blood pressure category: 120-129 | 21.3% (0.7) | 23.1% (1.4) | 20.2% (1.5) | 17.4% (5.6) | 37.3% (4.7) | 31.7% (2.3) |
| Systolic blood pressure category: 130-139 | 6.7% (0.4) | 6.1% (0.7) | 6.1% (1.0) | 8.4% (2.6) | 13.3% (3.4) | 16.8% (1.8) |
| Systolic blood pressure category: 140-159 | 3.3% (0.3) | 3.0% (0.6) | 2.9% (0.5) | NA | 5.1% (1.3) | 10.9% (1.2) |
| Systolic blood pressure category: >=160 | 0.3% (0.1) | 0.4% (0.2) | 0.2% (0.1) | NA | 1.6% (1.2) | 2.1% (0.4) |
| Diastolic blood pressure category: <80 | 84.9% (0.8) | 84.1% (1.3) | 82.8% (1.6) | 83.5% (4.3) | 66.3% (4.1) | 64.2% (2.5) |
| Diastolic blood pressure category: 80-89 | 11.7% (0.6) | 12.7% (1.3) | 14.4% (1.5) | 11.0% (3.4) | 25.0% (3.7) | 23.6% (2.0) |
| Diastolic blood pressure category: 90-99 | 2.8% (0.2) | 2.5% (0.5) | 2.5% (0.6) | 5.5% (3.3) | 8.0% (2.5) | 9.0% (1.0) |
| Diastolic blood pressure category: >=100 | 0.7% (0.1) | 0.7% (0.2) | 0.3% (0.2) | NA | 0.7% (0.5) | 3.2% (0.6) |
| BMI category: Underweight | 2.3% (0.2) | 1.2% (0.3) | 1.8% (0.5) | 0.4% (0.4) | NA | 0.7% (0.3) |
| BMI category: Normal weight | 36.5% (1.0) | 33.9% (2.1) | 38.5% (2.4) | 26.7% (6.6) | 16.9% (3.5) | 18.2% (1.7) |
| BMI category: Overweight | 30.3% (0.8) | 27.6% (1.6) | 29.7% (2.5) | 22.0% (5.3) | 23.8% (3.7) | 25.0% (2.1) |
| BMI category: Obesity I | 17.5% (0.6) | 19.5% (1.4) | 16.3% (1.4) | 25.3% (5.8) | 16.6% (3.2) | 25.7% (1.9) |
| BMI category: Obesity II | 7.9% (0.4) | 10.1% (1.1) | 8.3% (1.1) | 13.5% (3.0) | 16.7% (3.7) | 15.9% (1.4) |
| BMI category: Obesity III | 5.6% (0.3) | 7.7% (0.9) | 5.4% (0.9) | 12.0% (3.5) | 25.9% (3.6) | 14.6% (1.4) |
| Waist-to-height ratio category: <0.5 | 32.9% (1.1) | 31.8% (2.1) | 32.8% (2.1) | 25.3% (6.5) | 15.8% (3.9) | 15.0% (1.5) |
| Waist-to-height ratio category: 0.5-0.59 | 40.1% (0.9) | 35.3% (1.8) | 39.9% (2.0) | 34.9% (6.2) | 19.8% (3.5) | 34.8% (1.8) |
| Waist-to-height ratio category: >=0.6 | 27.1% (1.0) | 32.9% (1.9) | 27.3% (2.0) | 39.8% (6.3) | 64.4% (4.8) | 50.2% (1.8) |
| HDL cholesterol (mg/dL) | 51.20 (0.29) | 51.37 (0.60) | 51.80 (0.66) | 51.40 (2.33) | 47.78 (1.46) | 46.61 (0.67) |
| Non-HDL cholesterol (mg/dL) | 135.62 (0.63) | 132.25 (1.30) | 136.81 (1.75) | 140.36 (4.26) | 151.29 (4.03) | 147.60 (1.81) |
| Total cholesterol (mg/dL) | 186.82 (0.55) | 183.61 (1.13) | 188.61 (1.49) | 191.76 (3.97) | 199.07 (3.80) | 194.21 (1.69) |
| HbA1c (%) | 5.29 (0.01) | 5.29 (0.02) | 5.33 (0.03) | 5.40 (0.09) | 7.92 (0.21) | 5.57 (0.04) |
| Vitamin D category: <20 | 1.2% (0.2) | 1.1% (0.3) | 0.9% (0.3) | 1.4% (1.1) | 0.7% (0.5) | 2.0% (0.5) |
| Vitamin D category: 20-29 | 6.2% (0.5) | 6.1% (0.8) | 4.6% (0.8) | 5.6% (2.0) | 10.5% (2.3) | 7.0% (0.9) |
| Vitamin D category: >=30 | 92.6% (0.6) | 92.8% (0.9) | 94.6% (0.8) | 93.0% (2.5) | 88.8% (2.4) | 91.0% (1.2) |
| Dietary Inflammatory Index (DII) | 2.54 (0.06) | 2.60 (0.12) | 2.30 (0.13) | 3.29 (0.36) | 2.82 (0.24) | 2.78 (0.11) |
| DII including alcohol | 2.68 (0.06) | 2.74 (0.12) | 2.44 (0.13) | 3.49 (0.36) | 3.03 (0.24) | 2.92 (0.11) |
| 1 For each characteristic, survey-weighted means (SE) are shown for continuous variables, and percentages of total within that category are shown for categorical variables. | ||||||
tbl_char_secondary| Table 3: Characteristics of Time-Restricted Subcohorts and Participants with Eczema (2005–2006) and Food Allergy (2007–2010)1 | ||||
|---|---|---|---|---|
| Characteristic | Cohort (2005–2006) | Eczema | Cohort (2007–2010) | Food allergy |
| Age (years) | 30.26 (0.22) | 29.86 (0.97) | 30.07 (0.15) | 30.23 (0.26) |
| Sex: Male | 51.8% (1.2) | 39.9% (6.1) | 51.4% (0.9) | 43.9% (2.9) |
| Sex: Female | 48.2% (1.2) | 60.1% (6.1) | 48.6% (0.9) | 56.1% (2.9) |
| Race/ethnicity: Non-Hispanic White | 68.3% (2.9) | 79.5% (4.5) | 65.9% (2.8) | 63.0% (3.9) |
| Race/ethnicity: Non-Hispanic Black | 13.7% (2.1) | 16.1% (5.0) | 13.6% (1.2) | 19.4% (2.6) |
| Race/ethnicity: Mexican American | 12.7% (1.5) | 3.0% (1.1) | 13.3% (1.8) | 8.2% (1.4) |
| Race/ethnicity: Other Hispanic | 5.2% (1.3) | 1.4% (1.0) | 7.2% (1.2) | 9.4% (2.2) |
| Race/ethnicity: Other race | NA | NA | NA | NA |
| Insurance: No | 29.5% (2.4) | 21.6% (4.6) | 31.4% (1.2) | 26.2% (3.0) |
| Insurance: Yes | 70.5% (2.4) | 78.4% (4.6) | 68.6% (1.2) | 73.8% (3.0) |
| Education: <9 | 5.3% (0.9) | 0.6% (0.1) | 4.6% (0.5) | 2.7% (1.0) |
| Education: 9-12 | 35.8% (2.4) | 37.4% (6.0) | 36.7% (1.7) | 28.8% (2.9) |
| Education: >12 | 59.0% (2.8) | 61.9% (6.0) | 58.7% (1.7) | 68.5% (3.2) |
| Poverty-income ratio | 2.91 (0.06) | 3.03 (0.13) | 2.68 (0.05) | 2.74 (0.09) |
| Food security: Full | 76.7% (1.6) | 72.3% (5.1) | 87.4% (0.9) | 89.7% (2.0) |
| Food security: Marginal | 9.5% (0.9) | 13.2% (2.6) | 12.6% (0.9) | 10.3% (2.0) |
| Food security: Low | 9.2% (0.9) | 8.3% (3.5) | NA | NA |
| Food security: Very low | 4.6% (0.6) | 6.1% (2.3) | NA | NA |
| Smoking status: Never | 55.9% (1.4) | 46.9% (6.5) | 58.2% (1.7) | 57.2% (2.8) |
| Smoking status: Current | 29.5% (1.8) | 40.1% (6.2) | 27.6% (1.2) | 25.7% (2.5) |
| Smoking status: Former | 14.6% (0.8) | 13.0% (5.2) | 14.2% (1.0) | 17.2% (2.2) |
| Physical activity: Sedentary | 24.7% (1.6) | 20.7% (5.3) | 39.7% (1.3) | 36.0% (3.5) |
| Physical activity: Moderate | 24.0% (0.9) | 25.6% (5.2) | 23.8% (1.0) | 26.1% (2.7) |
| Physical activity: Intense | 51.2% (1.5) | 53.8% (6.4) | 36.5% (1.7) | 37.9% (3.4) |
| Alcohol use: No | 21.2% (1.9) | 14.9% (5.2) | 19.1% (0.9) | 22.9% (2.3) |
| Alcohol use: Yes | 78.8% (1.9) | 85.1% (5.2) | 80.9% (0.9) | 77.1% (2.3) |
| Systolic blood pressure category: <120 | 65.9% (1.9) | 65.4% (6.6) | 69.4% (1.1) | 76.4% (3.1) |
| Systolic blood pressure category: 120-129 | 21.4% (1.5) | 22.1% (3.9) | 21.3% (0.9) | 14.5% (2.6) |
| Systolic blood pressure category: 130-139 | 8.8% (0.9) | 8.9% (3.6) | 5.7% (0.4) | 6.6% (1.5) |
| Systolic blood pressure category: 140-159 | 3.7% (0.6) | 3.6% (2.0) | 3.3% (0.3) | 2.5% (0.9) |
| Systolic blood pressure category: >=160 | 0.2% (0.1) | NA | 0.3% (0.1) | NA |
| Diastolic blood pressure category: <80 | 86.9% (1.6) | 83.9% (5.9) | 84.5% (0.9) | 83.1% (2.6) |
| Diastolic blood pressure category: 80-89 | 10.1% (1.4) | 15.0% (5.8) | 12.0% (0.8) | 11.7% (2.1) |
| Diastolic blood pressure category: 90-99 | 2.6% (0.5) | 1.1% (0.7) | 2.8% (0.4) | 4.2% (1.3) |
| Diastolic blood pressure category: >=100 | 0.5% (0.2) | NA | 0.6% (0.2) | 1.0% (0.7) |
| BMI category: Underweight | 2.6% (0.6) | 1.2% (1.0) | 2.0% (0.3) | 2.0% (0.9) |
| BMI category: Normal weight | 37.7% (1.9) | 44.4% (4.6) | 35.8% (1.3) | 35.9% (3.1) |
| BMI category: Overweight | 31.0% (1.8) | 19.5% (3.4) | 30.1% (0.9) | 28.6% (2.7) |
| BMI category: Obesity I | 16.5% (1.2) | 16.0% (3.8) | 18.0% (0.8) | 18.7% (2.1) |
| BMI category: Obesity II | 7.2% (0.9) | 7.1% (2.7) | 8.3% (0.6) | 8.1% (1.6) |
| BMI category: Obesity III | 5.0% (0.6) | 11.8% (2.6) | 5.8% (0.5) | 6.8% (1.2) |
| Waist-to-height ratio category: <0.5 | 33.7% (2.1) | 34.6% (5.2) | 32.3% (1.5) | 33.6% (3.1) |
| Waist-to-height ratio category: 0.5-0.59 | 41.2% (2.1) | 34.5% (6.1) | 40.3% (1.2) | 36.8% (3.6) |
| Waist-to-height ratio category: >=0.6 | 25.1% (2.2) | 30.9% (6.0) | 27.4% (1.4) | 29.6% (3.0) |
| HDL cholesterol (mg/dL) | 52.58 (0.39) | 53.35 (1.14) | 50.53 (0.42) | 51.17 (1.33) |
| Non-HDL cholesterol (mg/dL) | 136.91 (1.38) | 133.34 (5.78) | 136.43 (0.88) | 132.23 (2.68) |
| Total cholesterol (mg/dL) | 189.50 (1.16) | 186.70 (5.20) | 186.96 (0.75) | 183.40 (2.28) |
| HbA1c (%) | 5.20 (0.02) | 5.09 (0.05) | 5.31 (0.01) | 5.28 (0.03) |
| Vitamin D category: <20 | 0.3% (0.1) | 0.4% (0.5) | 1.6% (0.3) | 2.8% (0.9) |
| Vitamin D category: 20-29 | 5.2% (0.8) | 4.2% (1.7) | 6.4% (0.7) | 9.5% (1.8) |
| Vitamin D category: >=30 | 94.5% (0.9) | 95.3% (1.8) | 92.0% (0.9) | 87.6% (2.4) |
| Dietary Inflammatory Index (DII) | 1.89 (0.06) | 2.09 (0.16) | 3.31 (0.08) | 2.98 (0.14) |
| DII including alcohol | 2.03 (0.07) | 2.22 (0.17) | 3.46 (0.09) | 3.15 (0.14) |
| 1 For each characteristic, survey-weighted means (SE) are shown for continuous variables, and percentages of total within that category are shown for categorical variables. | ||||
Weighted prevalence analysis
Survey-weighted prevalence estimates were calculated for all disease outcomes, both overall and across DII quartiles (Table 4; Figure 1). Overall disease prevalence ranged from 1.3% for inflammatory arthritis to 15.9% for asthma. As compared with the weighted prevalence in the least inflammatory quartile (Q1), the weighted prevalence in the most inflammatory quartile (Q4) was significantly higher for inflammatory arthritis (2.1% vs. 0.8%; p<0.05) and significantly lower for allergic rhinitis (11.5% vs. 14.1%; p<0.05). There was a non-significant trend of higher Q4 disease prevalence for eczema and hypertension. Together, this suggested variable, modest effects of dietary inflammatory potential on disease outcomes.
# Weighted prevalence by DII quartile----------------------------------------------------------
## Outcome configurations for prevalence
outcome_list_all <- c(outcome_list, outcome_list_secondary)
compute_prev_by_dii <- function(info) {
des <- info$design
y <- info$var
label_raw <- info$label
### Re-label atopic dermatitis to eczema for consistency
label <- recode(label_raw, "Atopic dermatitis" = "Eczema")
f <- as.formula(paste0("~", y))
vars <- des$variables
rows <- list()
### Overall prevalence
est_overall <- svymean(f, design = des, na.rm = TRUE)
p_overall <- as.numeric(est_overall)[1] * 100
se_overall <- as.numeric(SE(est_overall))[1] * 100
rows[[length(rows) + 1]] <- tibble(
Outcome = label,
dii_group = "Overall",
prev_pct = p_overall,
prev_se = se_overall
)
### Prevalence by DII quartile
if ("dii_quart" %in% names(vars)) {
dii_levels <- levels(vars$dii_quart)
for (lvl in dii_levels) {
des_sub <- subset(des, dii_quart == lvl)
est <- svymean(f, design = des_sub, na.rm = TRUE)
p <- as.numeric(est)[1] * 100
se <- as.numeric(SE(est))[1] * 100
rows[[length(rows) + 1]] <- tibble(
Outcome = label,
dii_group = as.character(lvl),
prev_pct = p,
prev_se = se
)
}
}
if (length(rows) == 0L) {
return(tibble(
Outcome = character(0),
dii_group = character(0),
prev_pct = numeric(0),
prev_se = numeric(0)
))
}
bind_rows(rows)
}
prev_all_long <- map_dfr(
outcome_list_all,
compute_prev_by_dii
)
## Harmonize DII group labels and ordering of outcomes
prev_all_long <- prev_all_long |>
mutate(
dii_group_simple = case_when(
dii_group == "Overall" ~ "Overall",
grepl("^Q1", dii_group) ~ "Q1",
grepl("^Q2", dii_group) ~ "Q2",
grepl("^Q3", dii_group) ~ "Q3",
grepl("^Q4", dii_group) ~ "Q4",
TRUE ~ dii_group
),
## Set base outcome order (for tables, etc.)
Outcome = factor(
Outcome,
levels = c(
"Asthma",
"Allergic rhinitis",
"Inflammatory arthritis",
"Diabetes",
"Hypertension",
"Eczema",
"Food allergy"
)
)
)
## Compute p-values for Q2–Q4 vs Q1 using survey-weighted logistic regression
compute_prev_pvals <- function(info) {
des <- info$design
y <- info$var
label_raw <- info$label
label <- recode(label_raw, "Atopic dermatitis" = "Eczema")
vars <- des$variables
if (!("dii_quart" %in% names(vars))) {
return(tibble(
Outcome = character(0),
dii_group_simple = character(0),
p_vs_Q1 = numeric(0)
))
}
dq <- vars$dii_quart
if (!is.factor(dq)) {
dq <- factor(dq)
}
levels_dq <- levels(dq)
if (length(levels_dq) < 2) {
return(tibble(
Outcome = character(0),
dii_group_simple = character(0),
p_vs_Q1 = numeric(0)
))
}
other_levels <- levels_dq[-1]
# Fit survey-weighted logistic regression: outcome ~ dii_quart
form <- as.formula(paste0(y, " ~ dii_quart"))
fit <- tryCatch(
svyglm(form, design = des, family = quasibinomial()),
error = function(e) NULL
)
if (is.null(fit)) {
return(tibble(
Outcome = character(0),
dii_group_simple = character(0),
p_vs_Q1 = numeric(0)
))
}
coefs <- summary(fit)$coef
out_rows <- purrr::map_dfr(other_levels, function(lvl) {
term <- paste0("dii_quart", lvl)
if (!term %in% rownames(coefs)) {
p_val <- NA_real_
} else {
p_val <- coefs[term, "Pr(>|t|)"]
}
quart_simple <- case_when(
grepl("^Q1", lvl) ~ "Q1",
grepl("^Q2", lvl) ~ "Q2",
grepl("^Q3", lvl) ~ "Q3",
grepl("^Q4", lvl) ~ "Q4",
TRUE ~ lvl
)
tibble(
Outcome = label,
dii_group_simple = quart_simple,
p_vs_Q1 = as.numeric(p_val)
)
})
out_rows
}
prev_pvals <- purrr::map_dfr(
outcome_list_all,
compute_prev_pvals
)
## Join p-values back to prevalence data
prev_all_long <- prev_all_long |>
left_join(
prev_pvals,
by = c("Outcome", "dii_group_simple")
)
## Color scheme for outcomes
disease_base_cols <- c(
"Asthma" = "#5E3C99",
"Allergic rhinitis" = "#89CFF0",
"Inflammatory arthritis" = "#009E73",
"Diabetes" = "#F0E442",
"Hypertension" = "#D55E00",
"Eczema" = "#E78AC3",
"Food allergy" = "#4B2E0F"
)
## Alpha for quartiles; overall uses a mid alpha
alpha_quart <- c(
"Q1" = 0.4,
"Q2" = 0.6,
"Q3" = 0.8,
"Q4" = 1.0
)
alpha_overall <- 0.7
prev_all_long <- prev_all_long |>
mutate(
base_col = disease_base_cols[as.character(Outcome)],
fill_col = case_when(
dii_group_simple == "Overall" ~ scales::alpha(base_col, alpha_overall), # mid shade
TRUE ~ scales::alpha(base_col, alpha_quart[dii_group_simple])
),
## Pretty labels for facets, including split lines
Outcome_pretty = forcats::fct_recode(
Outcome,
"Allergic\nrhinitis" = "Allergic rhinitis",
"Inflammatory\narthritis" = "Inflammatory arthritis"
),
## Explicit facet order:
## Asthma, Allergic rhinitis, Inflammatory arthritis, Eczema,
## Food allergy, Diabetes, Hypertension
Outcome_pretty = factor(
Outcome_pretty,
levels = c(
"Asthma",
"Allergic\nrhinitis",
"Inflammatory\narthritis",
"Eczema",
"Food allergy",
"Diabetes",
"Hypertension"
)
),
# Numeric x positions: extra gap between Overall and Q1
x_pos = case_when(
dii_group_simple == "Overall" ~ 1.0,
dii_group_simple == "Q1" ~ 2.5,
dii_group_simple == "Q2" ~ 3.5,
dii_group_simple == "Q3" ~ 4.5,
dii_group_simple == "Q4" ~ 5.5,
TRUE ~ NA_real_
)
)
## Split data for quartiles vs overall
prev_quart <- prev_all_long |>
filter(dii_group_simple != "Overall")
prev_overall <- prev_all_long |>
filter(dii_group_simple == "Overall")
## Table 4: Weighted prevalence table with SEs and p-values------------------------------------
## Ensure Outcome ordering for the table as requested
prev_all_long <- prev_all_long |>
mutate(
Outcome = factor(
Outcome,
levels = c(
"Asthma",
"Allergic rhinitis",
"Inflammatory arthritis",
"Diabetes",
"Hypertension",
"Eczema",
"Food allergy"
)
)
)
table4_df <- prev_all_long |>
mutate(
Condition = case_when(
dii_group_simple == "Overall" ~ "Overall",
TRUE ~ dii_group
)
) |>
select(
Outcome,
Condition,
`Weighted prevalence (%)` = prev_pct,
`SE (%)` = prev_se,
`p value` = p_vs_Q1
) |>
arrange(Outcome, Condition)
nhanes_table4_gt <- table4_df |>
gt(groupname_col = "Outcome") |>
tab_header(
title = md("**Table 4: Weighted Prevalence of Allergic, Immunologic, and Cardiometabolic Conditions Overall and by Dietary Inflammatory Index Quartile**")
) |>
fmt_number(
columns = c(`Weighted prevalence (%)`, `SE (%)`),
decimals = 1
) |>
fmt_number(
columns = `p value`,
decimals = 3
) |>
cols_label(
Condition = md("Condition"),
`Weighted prevalence (%)` = md("Weighted prevalence (%)"),
`SE (%)` = md("SE (%)"),
`p value` = md("p value")
) |>
tab_source_note(
md(
"P values were obtained in survey-weighted logistic regression models of each outcome on DII quartile, comparing Q2 to Q4 with Q1 using Wald tests."
)
) |>
tab_options(
table.font.size = px(12),
data_row.padding = px(1),
row_group.padding = px(1),
column_labels.padding = px(1),
table.width = pct(80)
) |>
tab_style(
style = cell_text(weight = "bold"),
locations = list(
cells_column_labels(everything()),
cells_row_groups()
)
)
## Weighted prevalence plot--------------------------------------------------------------------
weighted_prev_plot_all <- ggplot() +
### Quartile bars
geom_col(
data = prev_quart,
aes(
x = x_pos,
y = prev_pct,
fill = fill_col
),
width = 0.9
) +
### Overall bars: mid-shade fill, thick dark border
geom_col(
data = prev_overall,
aes(
x = x_pos,
y = prev_pct,
fill = fill_col,
color = base_col # darkest outline
),
width = 0.9,
size = 0.9
) +
facet_wrap(~ Outcome_pretty, ncol = 3, scales = "free_y") +
scale_fill_identity(guide = "none") +
scale_color_identity(guide = "none") +
scale_x_continuous(
breaks = c(1.0, 2.5, 3.5, 4.5, 5.5),
labels = c("Overall", "Q1", "Q2", "Q3", "Q4"),
expand = expansion(mult = c(0.05, 0.05))
) +
labs(
x = NULL,
y = "Weighted prevalence (%)",
title = "Figure 1: Weighted Prevalence of Allergic, Immunologic,\nCardiometabolic Conditions,\nOverall and by Dietary Inflammatory Index Quartile"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 15, hjust = 0.5),
strip.text = element_text(face = "bold", size = 12), # one size smaller
axis.title.y = element_text(face = "bold", size = 12),
axis.title.x = element_blank(),
axis.text.x = element_text(size = 11, angle = 45, hjust = 1),
axis.text.y = element_text(size = 11),
aspect.ratio = 1,
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.major.y = element_line(color = "grey90"),
panel.grid.minor.y = element_blank()
)
## Print Table 4 and Figure 1------------------------------------------------------------------
nhanes_table4_gt| Table 4: Weighted Prevalence of Allergic, Immunologic, and Cardiometabolic Conditions Overall and by Dietary Inflammatory Index Quartile | |||
|---|---|---|---|
| Condition | Weighted prevalence (%) | SE (%) | p value |
| Asthma | |||
| Overall | 15.9 | 0.6 | NA |
| Q1 (most anti-inflammatory) | 16.1 | 1.2 | NA |
| Q2 | 14.9 | 1.1 | 0.412 |
| Q3 | 16.1 | 1.2 | 0.994 |
| Q4 (most pro-inflammatory) | 17.0 | 1.0 | 0.566 |
| Allergic rhinitis | |||
| Overall | 11.9 | 0.5 | NA |
| Q1 (most anti-inflammatory) | 14.1 | 1.1 | NA |
| Q2 | 11.6 | 1.0 | 0.103 |
| Q3 | 11.5 | 1.0 | 0.082 |
| Q4 (most pro-inflammatory) | 11.5 | 0.8 | 0.035 |
| Inflammatory arthritis | |||
| Overall | 1.3 | 0.1 | NA |
| Q1 (most anti-inflammatory) | 0.8 | 0.3 | NA |
| Q2 | 1.6 | 0.4 | 0.063 |
| Q3 | 0.9 | 0.2 | 0.721 |
| Q4 (most pro-inflammatory) | 2.1 | 0.5 | 0.013 |
| Diabetes | |||
| Overall | 2.0 | 0.2 | NA |
| Q1 (most anti-inflammatory) | 2.1 | 0.3 | NA |
| Q2 | 1.3 | 0.3 | 0.092 |
| Q3 | 2.5 | 0.4 | 0.414 |
| Q4 (most pro-inflammatory) | 2.4 | 0.4 | 0.527 |
| Hypertension | |||
| Overall | 11.3 | 0.5 | NA |
| Q1 (most anti-inflammatory) | 10.5 | 0.9 | NA |
| Q2 | 10.5 | 1.0 | 0.994 |
| Q3 | 11.6 | 1.0 | 0.403 |
| Q4 (most pro-inflammatory) | 13.1 | 0.9 | 0.062 |
| Eczema | |||
| Overall | 7.6 | 0.8 | NA |
| Q1 (most anti-inflammatory) | 7.2 | 1.5 | NA |
| Q2 | 6.9 | 1.2 | 0.866 |
| Q3 | 8.0 | 1.2 | 0.673 |
| Q4 (most pro-inflammatory) | 11.4 | 4.3 | 0.308 |
| Food allergy | |||
| Overall | 10.1 | 0.5 | NA |
| Q1 (most anti-inflammatory) | 11.9 | 1.4 | NA |
| Q2 | 9.3 | 1.0 | 0.130 |
| Q3 | 11.6 | 1.2 | 0.882 |
| Q4 (most pro-inflammatory) | 9.1 | 0.8 | 0.085 |
| P values were obtained in survey-weighted logistic regression models of each outcome on DII quartile, comparing Q2 to Q4 with Q1 using Wald tests. | |||
weighted_prev_plot_allUnivariable regression analyses
Survey-weighted univariable logistic regression analyses were performed to examine associations between core sociodemographic, behavioral, clinical, and dietary covariates (Tables 5-6) or individual nutrient quartile covariates (Tables 7-8) and disease outcomes. Results for asthma, allergic rhinitis, inflammatory arthritis, diabetes, and hypertension are shown in Table 5 and Table 7. Results for eczema and food allergy are shown in Table 6 and Table 8.
Among the core covariates, the odds of allergic rhinitis increased with age (OR 1.06, 95% CI 1.05–1.08; p<0.001), while asthma decreased with age (OR 0.97, 95% CI 0.96–0.99; p<0.001). Female participants had higher odds of asthma (OR 1.35, 95% CI 1.17–1.56; p<0.001), allergic rhinitis (OR 1.39, 95% CI 1.19–1.61; p<0.001), and inflammatory arthritis (OR 2.31, 95% CI 1.53–3.49; p<0.001). Non-Hispanic Black (OR 0.62, 95% CI 0.50–0.78; p<0.001) and Mexican American (OR 0.35, 95% CI 0.27–0.45; p<0.001) participants had lower odds of allergic rhinitis, and Mexican American participants (OR 0.42, 95% CI 0.32–0.55; p<0.001) also had lower odds of asthma. Smoking increased the odds of inflammatory arthritis (OR 2.64, 95% CI 1.63–4.27; p<0.001). Among secondary outcomes, there was a trend of higher odds of eczema associated with obesity class III (OR 2.22, 95% CI 1.24–3.97; p=0.088), while Mexican American participants had lower odds of food allergy (OR 0.62, 95% CI 0.48–0.81; p=0.013). Regarding dietary inflammatory exposures, DII inclusive of alcohol intake was significantly associated with lower odds of allergic rhinitis (OR 0.96, 95% CI 0.92–0.99; p=0.044) and higher odds of hypertension (OR 1.05, 95% CI 1.01–1.08; p=0.049). In general, DII associations were not significant. Cardiometabolic outcomes were strongly associated with age and adiposity, for example increased odds of hypertension with age (OR 1.09, 95% CI 1.07–1.10; p<0.001), obesity class III (OR 7.15, 95% CI 5.10–10.02; p<0.001), and elevated waist-to-height ratio (OR 4.56, 95% CI 3.49–5.97; p<0.001).
In nutrient-specific analyses, there was a large degree of variability across conditions, and many associations did not achieve statistical significance after correction for multiple comparisons. Nevertheless, several significant associations were identified. For inflammatory arthritis, intakes of carbohydrates (Q3 OR 0.26, 95% CI 0.13–0.49; p=0.019) and total energy (Q3 OR 0.31, 95% CI 0.16–0.60; p=0.047) within the third quartile of the intake distribution were associated with lower odds of disease. Several non-significant trends were observed for cardiometabolic outcomes. For example, higher saturated fat intake (Q4 OR 1.29, 95% CI 1.04–1.60; p=0.218) and higher cholesterol intake (Q4 OR 1.39, 95% CI 1.11–1.73; p=0.095) showed a trend of increased odds of hypertension.
# Univariable analysis 1: core covariates------------------------------------------------------
## Define univariable analysis 1 predictors
uni1_predictors <- c(
"age",
"sex",
"race",
"insur",
"educ",
"pir",
"food_sec",
"smoking",
"phys_act",
"alcohol",
"sbp_cat",
"dbp_cat",
"bmi_cat",
"whtr_cat",
"hdl",
"non_hdl",
"tchol",
"hba1c",
"vitD25oh_cat",
"vitD25oh_cat_trend",
"dii",
"dii_quart_trend",
"dii_etoh"
)
uni1_predictor_order <- uni1_predictors
## Helper to clean Vitamin D and DII labels
clean_vitD_DII_labels <- function(df) {
df |>
mutate(
variable_label = as.character(variable_label),
variable_label = str_replace_all(variable_label, "25\\(OH\\)D", "Vitamin D"),
variable_label = str_replace(
variable_label,
"Vitamin D category trend \\(0–2\\)",
"Vitamin D category trend"
),
variable_label = str_replace(
variable_label,
"category trend \\(0–2\\)",
"category trend"
),
variable_label = str_replace(
variable_label,
"Dietary Inflammatory Index \\(per unit\\)",
"DII"
),
variable_label = str_replace(
variable_label,
"DII including alcohol \\(per unit\\)",
"DII including alcohol"
),
variable_label = str_replace(
variable_label,
"DII quartile trend \\(0–3\\)",
"DII quartile trend"
),
variable_label = factor(variable_label)
)
}
## Helper function for univariable tables in gt
make_uni_gt <- function(data, title_text) {
data |>
gt() |>
tab_header(
title = md(title_text)
) |>
fmt_markdown(columns = everything()) |>
tab_options(table.font.size = px(10)) |>
tab_style(
style = cell_text(weight = "bold"),
locations = cells_column_labels(everything())
) |>
tab_footnote(
footnote = "p values were corrected for multiple comparisons using the Benjamini-Hochberg false discovery rate (FDR) method",
locations = cells_column_labels(columns = contains("p value"))
)
}
## Univariable analysis 1 helper function
run_uni_model1 <- function(outcome_name, outcome_info, predictor) {
des <- outcome_info$design
y <- outcome_info$var
if (!predictor %in% names(des$variables)) {
warning("Predictor ", predictor, " not found in design for outcome ", outcome_name, ". Skipping.")
return(tibble())
}
form_uni <- as.formula(paste0(y, " ~ ", predictor))
fit <- svyglm(
form_uni,
design = des,
family = quasibinomial()
)
td <- tidy(fit, exponentiate = TRUE, conf.int = TRUE) |>
filter(term != "(Intercept)") |>
mutate(
outcome = outcome_name,
outcome_lab = outcome_info$label,
predictor = predictor
)
base_label <- uni1_labels[[predictor]]
if (is.null(base_label)) base_label <- predictor
add_ref_and_label(
td = td,
des = des,
predictor = predictor,
outcome_name = outcome_name,
outcome_info = outcome_info,
base_label = base_label
)
}
## Univariable analysis 1 primary outcomes
uni1_primary_long <- map_dfr(
names(outcome_list),
function(out_nm) {
info <- outcome_list[[out_nm]]
map_dfr(uni1_predictors, ~ run_uni_model1(out_nm, info, .x))
}
)
uni1_primary_long <- uni1_primary_long |>
rename(
OR = estimate,
LCL = conf.low,
UCL = conf.high,
p = p.value
) |>
mutate(
p_fdr = p.adjust(p, method = "BH"),
or_ci = if_else(
is.na(p) & OR == 1 & LCL == 1 & UCL == 1,
"1 (Ref)",
sprintf("%.2f (%.2f, %.2f)", OR, LCL, UCL)
),
p_fdr_fmt = ifelse(is.na(p_fdr), "", pvalue(p_fdr, accuracy = 0.001))
) |>
# Use predictor and variable_label to pull in ranks for ordering
left_join(custom_order_uni1, by = c("predictor", "variable_label")) |>
mutate(
predictor = factor(predictor, levels = uni1_predictor_order),
rank = coalesce(rank, 1000L)
) |>
clean_vitD_DII_labels() |>
# Override rank for specific predictors/levels to enforce desired order
mutate(
rank2 = case_when(
# Education: <9, 9–12, >12
predictor == "educ" & str_detect(variable_label, "Education: <9 years") ~ 1L,
predictor == "educ" & str_detect(variable_label, "Education: 9–12 years") ~ 2L,
predictor == "educ" & str_detect(variable_label, "Education: >12 years") ~ 3L,
# BMI: Underweight, Normal, Overweight, Obesity I, II, III
predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Underweight") ~ 1L,
predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Normal") ~ 2L,
predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Overweight") ~ 3L,
predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Obesity I") ~ 4L,
predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Obesity II") ~ 5L,
predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Obesity III")~ 6L,
# Vitamin D category: <20, 20–29, ≥30
predictor == "vitD25oh_cat" & str_detect(variable_label, "Vitamin D category: <20") ~ 1L,
predictor == "vitD25oh_cat" & str_detect(variable_label, "Vitamin D category: 20–29") ~ 2L,
predictor == "vitD25oh_cat" & str_detect(variable_label, "Vitamin D category: ≥30") ~ 3L,
# Systolic BP: <120, 120–129, 130–139, 140–159, ≥160
predictor == "sbp_cat" & str_detect(variable_label, "Systolic blood pressure category: <120") ~ 1L,
predictor == "sbp_cat" & str_detect(variable_label, "Systolic blood pressure category: 120–129") ~ 2L,
predictor == "sbp_cat" & str_detect(variable_label, "Systolic blood pressure category: 130–139") ~ 3L,
predictor == "sbp_cat" & str_detect(variable_label, "Systolic blood pressure category: 140–159") ~ 4L,
predictor == "sbp_cat" & str_detect(variable_label, "Systolic blood pressure category: ≥160") ~ 5L,
# Diastolic BP: <80, 80–89, 90–99, ≥100
predictor == "dbp_cat" & str_detect(variable_label, "Diastolic blood pressure category: <80") ~ 1L,
predictor == "dbp_cat" & str_detect(variable_label, "Diastolic blood pressure category: 80–89") ~ 2L,
predictor == "dbp_cat" & str_detect(variable_label, "Diastolic blood pressure category: 90–99") ~ 3L,
predictor == "dbp_cat" & str_detect(variable_label, "Diastolic blood pressure category: ≥100") ~ 4L,
# Waist-to-height ratio: <0.5, 0.5–0.59, ≥0.6
predictor == "whtr_cat" & str_detect(variable_label, "Waist-to-height ratio category: <0.5") ~ 1L,
predictor == "whtr_cat" & str_detect(variable_label, "Waist-to-height ratio category: 0.5–0.59") ~ 2L,
predictor == "whtr_cat" & str_detect(variable_label, "Waist-to-height ratio category: ≥0.6") ~ 3L,
TRUE ~ rank
)
) |>
arrange(predictor, rank2, variable_label, outcome_lab)
uni1_primary_wide <- uni1_primary_long |>
select(
variable_label,
outcome_lab,
or_ci,
p_fdr_fmt
) |>
pivot_wider(
id_cols = variable_label,
names_from = outcome_lab,
values_from = c(or_ci, p_fdr_fmt),
names_glue = "{outcome_lab} {.value}"
) |>
rename_with(~ str_replace(.x, " or_ci", " OR (95% CI)")) |>
rename_with(~ str_replace(.x, " p_fdr_fmt", " p value")) |>
rename(Variable = variable_label)
uni1_primary_outcomes_order <- c(
"Asthma",
"Allergic rhinitis",
"Inflammatory arthritis",
"Diabetes",
"Hypertension"
)
uni1_primary_col_order <- c(
"Variable",
as.vector(rbind(
paste0(uni1_primary_outcomes_order, " OR (95% CI)"),
paste0(uni1_primary_outcomes_order, " p value")
))
)
uni1_primary_wide <- uni1_primary_wide |>
select(all_of(uni1_primary_col_order))
tbl_uni1_primary <- make_uni_gt(
uni1_primary_wide,
"**Table 5: Univariable Associations of Core Covariates With Asthma, Allergic Rhinitis, Inflammatory Arthritis, Diabetes, and Hypertension**"
)
## Univariable analysis 1 secondary outcomes
uni1_secondary_long <- map_dfr(
names(outcome_list_secondary),
function(out_nm) {
info <- outcome_list_secondary[[out_nm]]
map_dfr(uni1_predictors, ~ run_uni_model1(out_nm, info, .x))
}
)
uni1_secondary_long <- uni1_secondary_long |>
rename(
OR = estimate,
LCL = conf.low,
UCL = conf.high,
p = p.value
) |>
mutate(
p_fdr = p.adjust(p, method = "BH"),
or_ci = if_else(
is.na(p) & OR == 1 & LCL == 1 & UCL == 1,
"1 (Ref)",
sprintf("%.2f (%.2f, %.2f)", OR, LCL, UCL)
),
p_fdr_fmt = ifelse(is.na(p_fdr), "", pvalue(p_fdr, accuracy = 0.001))
) |>
left_join(custom_order_uni1, by = c("predictor", "variable_label")) |>
mutate(
predictor = factor(predictor, levels = uni1_predictor_order),
rank = coalesce(rank, 1000L)
) |>
clean_vitD_DII_labels() |>
mutate(
rank2 = case_when(
# Education: <9, 9–12, >12
predictor == "educ" & str_detect(variable_label, "Education: <9 years") ~ 1L,
predictor == "educ" & str_detect(variable_label, "Education: 9–12 years") ~ 2L,
predictor == "educ" & str_detect(variable_label, "Education: >12 years") ~ 3L,
# BMI: Underweight, Normal, Overweight, Obesity I, II, III
predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Underweight") ~ 1L,
predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Normal") ~ 2L,
predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Overweight") ~ 3L,
predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Obesity I") ~ 4L,
predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Obesity II") ~ 5L,
predictor == "bmi_cat" & str_detect(variable_label, "BMI category: Obesity III")~ 6L,
# Vitamin D category: <20, 20–29, ≥30
predictor == "vitD25oh_cat" & str_detect(variable_label, "Vitamin D category: <20") ~ 1L,
predictor == "vitD25oh_cat" & str_detect(variable_label, "Vitamin D category: 20–29") ~ 2L,
predictor == "vitD25oh_cat" & str_detect(variable_label, "Vitamin D category: ≥30") ~ 3L,
# Systolic BP: <120, 120–129, 130–139, 140–159, ≥160
predictor == "sbp_cat" & str_detect(variable_label, "Systolic blood pressure category: <120") ~ 1L,
predictor == "sbp_cat" & str_detect(variable_label, "Systolic blood pressure category: 120–129") ~ 2L,
predictor == "sbp_cat" & str_detect(variable_label, "Systolic blood pressure category: 130–139") ~ 3L,
predictor == "sbp_cat" & str_detect(variable_label, "Systolic blood pressure category: 140–159") ~ 4L,
predictor == "sbp_cat" & str_detect(variable_label, "Systolic blood pressure category: ≥160") ~ 5L,
# Diastolic BP: <80, 80–89, 90–99, ≥100
predictor == "dbp_cat" & str_detect(variable_label, "Diastolic blood pressure category: <80") ~ 1L,
predictor == "dbp_cat" & str_detect(variable_label, "Diastolic blood pressure category: 80–89") ~ 2L,
predictor == "dbp_cat" & str_detect(variable_label, "Diastolic blood pressure category: 90–99") ~ 3L,
predictor == "dbp_cat" & str_detect(variable_label, "Diastolic blood pressure category: ≥100") ~ 4L,
# Waist-to-height ratio: <0.5, 0.5–0.59, ≥0.6
predictor == "whtr_cat" & str_detect(variable_label, "Waist-to-height ratio category: <0.5") ~ 1L,
predictor == "whtr_cat" & str_detect(variable_label, "Waist-to-height ratio category: 0.5–0.59") ~ 2L,
predictor == "whtr_cat" & str_detect(variable_label, "Waist-to-height ratio category: ≥0.6") ~ 3L,
TRUE ~ rank
)
) |>
arrange(predictor, rank2, variable_label, outcome_lab)
uni1_secondary_wide <- uni1_secondary_long |>
select(
variable_label,
outcome_lab,
or_ci,
p_fdr_fmt
) |>
pivot_wider(
id_cols = variable_label,
names_from = outcome_lab,
values_from = c(or_ci, p_fdr_fmt),
names_glue = "{outcome_lab} {.value}"
) |>
rename_with(~ str_replace(.x, " or_ci", " OR (95% CI)")) |>
rename_with(~ str_replace(.x, " p_fdr_fmt", " p value")) |>
rename(Variable = variable_label)
uni1_secondary_outcomes_order <- c(
"Atopic dermatitis",
"Food allergy"
)
uni1_secondary_col_order <- c(
"Variable",
as.vector(rbind(
paste0(uni1_secondary_outcomes_order, " OR (95% CI)"),
paste0(uni1_secondary_outcomes_order, " p value")
))
)
uni1_secondary_wide <- uni1_secondary_wide |>
select(all_of(uni1_secondary_col_order))
tbl_uni1_secondary <- make_uni_gt(
uni1_secondary_wide,
"**Table 6: Univariable Associations of Core Covariates With Eczema and Food Allergy**"
)
# Univariable modeling 2: individual DII nutrient quartiles------------------------------------
## Define univariable analysis 2 predictors
dii_nutrient_bases <- c(
"kcal", "carb", "protein", "totalfat", "satfat",
"pufa", "mufa", "n3fat", "n6fat", "choles",
"vitA", "bcarotene", "vitC", "vitE",
"vitB6", "vitB12", "folicacid",
"thiamin", "riboflavin", "niacin",
"iron", "mg", "zn", "se",
"fiber", "caffeine"
)
nutrient_labels <- c(
kcal = "Total energy intake (kcal)",
carb = "Carbohydrate intake",
protein = "Protein intake",
totalfat = "Total fat intake",
satfat = "Saturated fat intake",
pufa = "Polyunsaturated fat intake",
mufa = "Monounsaturated fat intake",
n3fat = "Omega-3 fat intake",
n6fat = "Omega-6 fat intake",
choles = "Cholesterol intake",
vitA = "Vitamin A intake",
bcarotene = "Beta-carotene intake",
vitC = "Vitamin C intake",
vitE = "Vitamin E intake",
vitB6 = "Vitamin B6 intake",
vitB12 = "Vitamin B12 intake",
folicacid = "Folic acid intake",
thiamin = "Thiamin intake",
riboflavin = "Riboflavin intake",
niacin = "Niacin intake",
iron = "Iron intake",
mg = "Magnesium intake",
zn = "Zinc intake",
se = "Selenium intake",
fiber = "Fiber intake",
caffeine = "Caffeine intake"
)
dii_nutrient_quart_predictors <- paste0(dii_nutrient_bases, "_quart")
## Univariable analysis 2 helper function
run_uni_model2 <- function(outcome_name, outcome_info, predictor) {
des <- outcome_info$design
y <- outcome_info$var
if (!predictor %in% names(des$variables)) {
warning("Predictor ", predictor, " not found in design for outcome ", outcome_name, ". Skipping.")
return(tibble())
}
form_uni <- as.formula(paste0(y, " ~ ", predictor))
fit <- svyglm(
form_uni,
design = des,
family = quasibinomial()
)
td <- tidy(fit, exponentiate = TRUE, conf.int = TRUE) |>
filter(term != "(Intercept)") |>
mutate(
outcome = outcome_name,
outcome_lab = outcome_info$label,
predictor = predictor
)
base <- sub("_quart$", "", predictor)
base_label <- nutrient_labels[[base]]
if (is.null(base_label)) base_label <- base
var_data <- des$variables[[predictor]]
if (is.character(var_data)) {
var_data <- factor(var_data)
}
if (is.factor(var_data)) {
levels_vec <- levels(var_data)
pattern <- paste0("^", predictor)
td_nonref <- td |>
mutate(
level = sub(pattern, "", term),
variable_label = paste0(base_label, ": ", level)
)
ref_level <- levels_vec[1]
ref_label <- paste0(base_label, ": ", ref_level)
ref_row <- tibble(
term = paste0(predictor, ref_level),
estimate = 1,
conf.low = 1,
conf.high = 1,
p.value = NA_real_,
outcome = outcome_name,
outcome_lab = outcome_info$label,
predictor = predictor,
level = ref_level,
variable_label = ref_label
)
bind_rows(ref_row, td_nonref)
} else {
td |>
mutate(variable_label = base_label)
}
}
## Univariable analysis 2 primary outcomes
uni2_primary_long <- map_dfr(
names(outcome_list),
function(out_nm) {
info <- outcome_list[[out_nm]]
map_dfr(dii_nutrient_quart_predictors, ~ run_uni_model2(out_nm, info, .x))
}
)
uni2_primary_long <- uni2_primary_long |>
rename(
OR = estimate,
LCL = conf.low,
UCL = conf.high,
p = p.value
) |>
mutate(
p_fdr = p.adjust(p, method = "BH"),
or_ci = if_else(
is.na(p) & OR == 1 & LCL == 1 & UCL == 1,
"1 (Ref)",
sprintf("%.2f (%.2f, %.2f)", OR, LCL, UCL)
),
p_fdr_fmt = ifelse(is.na(p_fdr), "", pvalue(p_fdr, accuracy = 0.001))
) |>
mutate(
base = sub("_quart$", "", predictor),
base = factor(base, levels = dii_nutrient_bases),
variable_label = factor(variable_label)
) |>
arrange(base, variable_label, outcome_lab)
uni2_primary_wide <- uni2_primary_long |>
select(
variable_label,
outcome_lab,
or_ci,
p_fdr_fmt
) |>
pivot_wider(
id_cols = variable_label,
names_from = outcome_lab,
values_from = c(or_ci, p_fdr_fmt),
names_glue = "{outcome_lab} {.value}"
) |>
rename_with(~ str_replace(.x, " or_ci", " OR (95% CI)")) |>
rename_with(~ str_replace(.x, " p_fdr_fmt", " p value")) |>
rename(Variable = variable_label)
uni2_primary_outcomes_order <- c(
"Asthma",
"Allergic rhinitis",
"Inflammatory arthritis",
"Diabetes",
"Hypertension"
)
uni2_primary_col_order <- c(
"Variable",
as.vector(rbind(
paste0(uni2_primary_outcomes_order, " OR (95% CI)"),
paste0(uni2_primary_outcomes_order, " p value")
))
)
uni2_primary_wide <- uni2_primary_wide |>
select(all_of(uni2_primary_col_order))
tbl_uni2_primary <- make_uni_gt(
uni2_primary_wide,
"**Table 7: Univariable Associations of Individual Nutritional Covariates With Asthma, Allergic Rhinitis, Inflammatory Arthritis, Diabetes, and Hypertension**"
)
## Univariable analysis 2 secondary outcomes
uni2_secondary_long <- map_dfr(
names(outcome_list_secondary),
function(out_nm) {
info <- outcome_list_secondary[[out_nm]]
map_dfr(dii_nutrient_quart_predictors, ~ run_uni_model2(out_nm, info, .x))
}
)
uni2_secondary_long <- uni2_secondary_long |>
rename(
OR = estimate,
LCL = conf.low,
UCL = conf.high,
p = p.value
) |>
mutate(
p_fdr = p.adjust(p, method = "BH"),
or_ci = if_else(
is.na(p) & OR == 1 & LCL == 1 & UCL == 1,
"1 (Ref)",
sprintf("%.2f (%.2f, %.2f)", OR, LCL, UCL)
),
p_fdr_fmt = ifelse(is.na(p_fdr), "", pvalue(p_fdr, accuracy = 0.001))
) |>
mutate(
base = sub("_quart$", "", predictor),
base = factor(base, levels = dii_nutrient_bases),
variable_label = factor(variable_label)
) |>
arrange(base, variable_label, outcome_lab)
uni2_secondary_wide <- uni2_secondary_long |>
select(
variable_label,
outcome_lab,
or_ci,
p_fdr_fmt
) |>
pivot_wider(
id_cols = variable_label,
names_from = outcome_lab,
values_from = c(or_ci, p_fdr_fmt),
names_glue = "{outcome_lab} {.value}"
) |>
rename_with(~ str_replace(.x, " or_ci", " OR (95% CI)")) |>
rename_with(~ str_replace(.x, " p_fdr_fmt", " p value")) |>
rename(Variable = variable_label)
uni2_secondary_outcomes_order <- c(
"Atopic dermatitis",
"Food allergy"
)
uni2_secondary_col_order <- c(
"Variable",
as.vector(rbind(
paste0(uni2_secondary_outcomes_order, " OR (95% CI)"),
paste0(uni2_secondary_outcomes_order, " p value")
))
)
uni2_secondary_wide <- uni2_secondary_wide |>
select(all_of(uni2_secondary_col_order))
tbl_uni2_secondary <- make_uni_gt(
uni2_secondary_wide,
"**Table 8: Univariable Associations of Individual Nutritional Covariates With Eczema and Food Allergy**"
)
# Print univariable analysis tables------------------------------------------------------------
tbl_uni1_primary| Table 5: Univariable Associations of Core Covariates With Asthma, Allergic Rhinitis, Inflammatory Arthritis, Diabetes, and Hypertension | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Variable | Asthma OR (95% CI) | Asthma p value1 | Allergic rhinitis OR (95% CI) | Allergic rhinitis p value1 | Inflammatory arthritis OR (95% CI) | Inflammatory arthritis p value1 | Diabetes OR (95% CI) | Diabetes p value1 | Hypertension OR (95% CI) | Hypertension p value1 |
| Age (years) | 0.97 (0.96, 0.99) | <0.001 | 1.06 (1.05, 1.08) | <0.001 | 1.10 (1.05, 1.15) | <0.001 | 1.11 (1.07, 1.14) | <0.001 | 1.09 (1.07, 1.10) | <0.001 |
| Sex: Male | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Sex: Female | 1.35 (1.17, 1.56) | <0.001 | 1.39 (1.19, 1.61) | <0.001 | 2.31 (1.53, 3.49) | <0.001 | 1.25 (0.85, 1.83) | 0.363 | 0.83 (0.71, 0.97) | 0.050 |
| Race/ethnicity: Non-Hispanic White | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Race/ethnicity: Non-Hispanic Black | 1.04 (0.87, 1.24) | 0.799 | 0.62 (0.50, 0.78) | <0.001 | 0.91 (0.52, 1.58) | 0.833 | 1.96 (1.30, 2.95) | 0.007 | 1.69 (1.39, 2.05) | <0.001 |
| Race/ethnicity: Mexican American | 0.42 (0.32, 0.55) | <0.001 | 0.35 (0.27, 0.45) | <0.001 | 0.50 (0.26, 0.96) | 0.088 | 1.04 (0.66, 1.64) | 0.912 | 0.80 (0.62, 1.03) | 0.151 |
| Race/ethnicity: Other Hispanic | 0.77 (0.57, 1.03) | 0.144 | 0.62 (0.42, 0.92) | 0.049 | 0.48 (0.20, 1.16) | 0.182 | 1.66 (1.00, 2.75) | 0.101 | 0.82 (0.60, 1.11) | 0.296 |
| Insurance: No | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Insurance: Yes | 1.25 (1.05, 1.50) | 0.043 | 1.46 (1.19, 1.79) | 0.002 | 2.22 (1.27, 3.89) | 0.020 | 1.40 (0.95, 2.07) | 0.155 | 1.31 (1.12, 1.54) | 0.004 |
| Education: <9 years | 0.36 (0.20, 0.65) | 0.004 | 0.44 (0.27, 0.71) | 0.004 | 1.80 (0.63, 5.11) | 0.376 | 1.47 (0.67, 3.21) | 0.450 | 0.76 (0.48, 1.20) | 0.345 |
| Education: 9–12 years | 0.95 (0.81, 1.12) | 0.690 | 0.62 (0.52, 0.75) | <0.001 | 1.28 (0.79, 2.08) | 0.421 | 1.27 (0.87, 1.87) | 0.327 | 1.12 (0.95, 1.33) | 0.269 |
| Education: >12 years | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Poverty-income ratio | 0.95 (0.90, 1.01) | 0.190 | 1.12 (1.06, 1.18) | <0.001 | 0.84 (0.73, 0.96) | 0.040 | 0.87 (0.79, 0.96) | 0.019 | 0.95 (0.90, 1.00) | 0.109 |
| Food security: Full | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Food security: Marginal | 1.24 (0.93, 1.65) | 0.238 | 0.71 (0.56, 0.90) | 0.020 | 1.78 (1.01, 3.11) | 0.095 | 1.59 (0.88, 2.86) | 0.210 | 1.34 (1.03, 1.74) | 0.072 |
| Food security: Low | 0.90 (0.60, 1.35) | 0.747 | 0.63 (0.33, 1.18) | 0.238 | 1.08 (0.31, 3.73) | 0.948 | 1.02 (0.37, 2.77) | 0.984 | 0.97 (0.61, 1.55) | 0.948 |
| Food security: Very low | 2.53 (1.78, 3.59) | <0.001 | 0.60 (0.36, 0.99) | 0.099 | 3.16 (1.10, 9.07) | 0.076 | 2.91 (0.91, 9.31) | 0.136 | 1.04 (0.48, 2.27) | 0.952 |
| Smoking status: Never | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Smoking status: Current | 1.48 (1.22, 1.79) | <0.001 | 0.96 (0.76, 1.21) | 0.833 | 2.64 (1.63, 4.27) | <0.001 | 1.04 (0.69, 1.57) | 0.910 | 1.60 (1.28, 2.00) | <0.001 |
| Smoking status: Former | 1.04 (0.81, 1.34) | 0.833 | 1.26 (0.95, 1.69) | 0.190 | 1.42 (0.70, 2.88) | 0.445 | 0.89 (0.46, 1.73) | 0.833 | 1.48 (1.14, 1.93) | 0.014 |
| Physical activity: Sedentary | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Physical activity: Moderate | 1.14 (0.94, 1.38) | 0.296 | 1.45 (1.18, 1.78) | 0.002 | 1.48 (0.90, 2.44) | 0.210 | 1.04 (0.69, 1.58) | 0.912 | 1.03 (0.83, 1.28) | 0.861 |
| Physical activity: Intense | 1.13 (0.97, 1.31) | 0.215 | 1.23 (1.02, 1.47) | 0.073 | 0.52 (0.29, 0.94) | 0.073 | 0.41 (0.27, 0.62) | <0.001 | 0.74 (0.58, 0.93) | 0.033 |
| Alcohol use: No | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Alcohol use: Yes | 1.26 (1.02, 1.55) | 0.073 | 1.21 (1.01, 1.46) | 0.095 | 0.70 (0.43, 1.13) | 0.238 | 0.63 (0.42, 0.95) | 0.069 | 1.12 (0.93, 1.36) | 0.333 |
| Systolic blood pressure category: <120 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Systolic blood pressure category: 120–129 | 1.12 (0.93, 1.36) | 0.336 | 0.91 (0.74, 1.12) | 0.486 | 0.76 (0.34, 1.72) | 0.634 | 2.88 (1.82, 4.56) | <0.001 | 2.98 (2.31, 3.83) | <0.001 |
| Systolic blood pressure category: 130–139 | 0.91 (0.70, 1.19) | 0.634 | 0.88 (0.60, 1.29) | 0.634 | 1.17 (0.57, 2.43) | 0.786 | 3.29 (1.70, 6.39) | 0.003 | 5.86 (4.21, 8.16) | <0.001 |
| Systolic blood pressure category: 140–159 | 0.94 (0.60, 1.45) | 0.843 | 0.85 (0.59, 1.24) | 0.522 | 0.00 (0.00, 0.00) | <0.001 | 2.55 (1.39, 4.68) | 0.011 | 8.90 (6.12, 12.92) | <0.001 |
| Systolic blood pressure category: ≥160 | 1.24 (0.44, 3.55) | 0.799 | 0.65 (0.18, 2.27) | 0.631 | 0.00 (0.00, 0.00) | <0.001 | 9.29 (1.92, 45.01) | 0.020 | 50.34 (21.90, 115.71) | <0.001 |
| Diastolic blood pressure category: <80 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Diastolic blood pressure category: 80–89 | 1.11 (0.89, 1.39) | 0.483 | 1.31 (0.99, 1.73) | 0.116 | 0.97 (0.48, 1.95) | 0.963 | 2.81 (1.86, 4.24) | <0.001 | 3.14 (2.48, 3.98) | <0.001 |
| Diastolic blood pressure category: 90–99 | 0.89 (0.56, 1.42) | 0.757 | 0.91 (0.57, 1.48) | 0.832 | 2.08 (0.61, 7.14) | 0.348 | 3.82 (1.91, 7.63) | 0.001 | 6.09 (4.47, 8.30) | <0.001 |
| Diastolic blood pressure category: ≥100 | 1.10 (0.58, 2.09) | 0.843 | 0.39 (0.11, 1.37) | 0.234 | 0.00 (0.00, 0.00) | <0.001 | 1.43 (0.30, 6.68) | 0.780 | 13.34 (7.50, 23.73) | <0.001 |
| BMI category: Underweight (<18.5) | 0.54 (0.30, 0.97) | 0.089 | 0.73 (0.41, 1.31) | 0.404 | 0.27 (0.03, 2.40) | 0.348 | 0.00 (0.00, 0.00) | <0.001 | 0.58 (0.23, 1.42) | 0.338 |
| BMI category: Normal (18.5–24.9) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| BMI category: Overweight (25–29.9) | 0.98 (0.82, 1.18) | 0.896 | 0.93 (0.71, 1.21) | 0.701 | 1.00 (0.41, 2.45) | 0.994 | 1.71 (0.98, 2.99) | 0.116 | 1.73 (1.29, 2.30) | 0.001 |
| BMI category: Obesity I (30–34.9) | 1.25 (0.98, 1.58) | 0.138 | 0.87 (0.69, 1.11) | 0.370 | 2.02 (0.90, 4.51) | 0.157 | 2.07 (1.10, 3.89) | 0.064 | 3.33 (2.60, 4.27) | <0.001 |
| BMI category: Obesity II (35–39.9) | 1.46 (1.09, 1.97) | 0.038 | 1.00 (0.70, 1.42) | 0.994 | 2.39 (1.28, 4.47) | 0.021 | 4.71 (2.29, 9.71) | <0.001 | 4.89 (3.71, 6.44) | <0.001 |
| BMI category: Obesity III (≥40) | 1.63 (1.19, 2.22) | 0.010 | 0.91 (0.60, 1.38) | 0.783 | 3.12 (1.34, 7.28) | 0.028 | 10.89 (6.50, 18.26) | <0.001 | 7.04 (5.26, 9.41) | <0.001 |
| Waist-to-height ratio category: <0.5 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Waist-to-height ratio category: 0.5–0.59 | 0.89 (0.74, 1.08) | 0.363 | 1.00 (0.82, 1.22) | 0.994 | 1.13 (0.52, 2.47) | 0.837 | 1.03 (0.52, 2.02) | 0.968 | 2.01 (1.56, 2.59) | <0.001 |
| Waist-to-height ratio category: ≥0.6 | 1.32 (1.08, 1.61) | 0.021 | 1.02 (0.81, 1.29) | 0.913 | 1.97 (0.96, 4.04) | 0.125 | 5.14 (2.75, 9.63) | <0.001 | 4.87 (3.89, 6.10) | <0.001 |
| HDL cholesterol (mg/dL) | 1.00 (1.00, 1.01) | 0.833 | 1.00 (1.00, 1.01) | 0.445 | 1.00 (0.98, 1.02) | 0.970 | 0.98 (0.97, 1.00) | 0.082 | 0.97 (0.96, 0.98) | <0.001 |
| Non-HDL cholesterol (mg/dL) | 1.00 (1.00, 1.00) | 0.017 | 1.00 (1.00, 1.00) | 0.531 | 1.00 (1.00, 1.01) | 0.333 | 1.01 (1.01, 1.01) | <0.001 | 1.01 (1.01, 1.01) | <0.001 |
| Total cholesterol (mg/dL) | 1.00 (1.00, 1.00) | 0.010 | 1.00 (1.00, 1.00) | 0.250 | 1.00 (1.00, 1.01) | 0.288 | 1.01 (1.00, 1.01) | 0.002 | 1.01 (1.00, 1.01) | <0.001 |
| HbA1c (%) | 0.98 (0.88, 1.10) | 0.845 | 1.08 (0.97, 1.21) | 0.268 | 1.17 (1.00, 1.37) | 0.114 | 4.52 (3.18, 6.44) | <0.001 | 1.50 (1.38, 1.63) | <0.001 |
| Vitamin D category: <20 (deficient) | 0.87 (0.51, 1.47) | 0.725 | 0.66 (0.32, 1.37) | 0.369 | 1.14 (0.26, 5.00) | 0.918 | 0.61 (0.14, 2.62) | 0.634 | 1.79 (1.05, 3.04) | 0.073 |
| Vitamin D category: 20–29 (insufficient) | 0.99 (0.72, 1.36) | 0.970 | 0.69 (0.47, 1.02) | 0.116 | 0.87 (0.41, 1.88) | 0.833 | 1.81 (1.12, 2.92) | 0.046 | 1.18 (0.87, 1.60) | 0.381 |
| Vitamin D category: ≥30 (sufficient) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Vitamin D category trend | 1.04 (0.84, 1.29) | 0.833 | 1.35 (1.03, 1.77) | 0.069 | 1.04 (0.57, 1.91) | 0.948 | 0.76 (0.54, 1.07) | 0.197 | 0.79 (0.63, 1.00) | 0.099 |
| DII | 1.01 (0.98, 1.05) | 0.606 | 0.96 (0.92, 0.99) | 0.050 | 1.15 (1.01, 1.30) | 0.089 | 1.05 (0.97, 1.14) | 0.327 | 1.05 (1.01, 1.09) | 0.050 |
| DII quartile trend | 1.03 (0.96, 1.10) | 0.606 | 0.93 (0.86, 1.00) | 0.095 | 1.30 (1.00, 1.68) | 0.099 | 1.11 (0.95, 1.30) | 0.300 | 1.09 (1.00, 1.18) | 0.099 |
| DII including alcohol | 1.01 (0.98, 1.04) | 0.634 | 0.96 (0.92, 0.99) | 0.044 | 1.15 (1.01, 1.31) | 0.072 | 1.06 (0.98, 1.14) | 0.210 | 1.05 (1.01, 1.08) | 0.049 |
| 1 p values were corrected for multiple comparisons using the Benjamini-Hochberg false discovery rate (FDR) method | ||||||||||
tbl_uni1_secondary| Table 6: Univariable Associations of Core Covariates With Eczema and Food Allergy | ||||
|---|---|---|---|---|
| Variable | Atopic dermatitis OR (95% CI) | Atopic dermatitis p value1 | Food allergy OR (95% CI) | Food allergy p value1 |
| Age (years) | 0.99 (0.94, 1.04) | 0.837 | 1.00 (0.99, 1.02) | 0.775 |
| Sex: Male | 1 (Ref) | 1 (Ref) | ||
| Sex: Female | 1.68 (0.95, 2.98) | 0.271 | 1.40 (1.09, 1.79) | 0.088 |
| Race/ethnicity: Non-Hispanic White | 1 (Ref) | 1 (Ref) | ||
| Race/ethnicity: Non-Hispanic Black | 1.00 (0.55, 1.83) | 0.993 | 1.57 (1.15, 2.14) | 0.058 |
| Race/ethnicity: Mexican American | 0.19 (0.08, 0.44) | 0.013 | 0.62 (0.48, 0.81) | 0.013 |
| Race/ethnicity: Other Hispanic | 0.21 (0.06, 0.74) | 0.097 | 1.41 (0.94, 2.13) | 0.316 |
| Insurance: No | 1 (Ref) | 1 (Ref) | ||
| Insurance: Yes | 1.57 (0.90, 2.74) | 0.316 | 1.33 (0.94, 1.87) | 0.316 |
| Education: <9 years | 0.10 (0.06, 0.18) | <0.001 | 0.47 (0.23, 0.96) | 0.169 |
| Education: 9–12 years | 0.99 (0.54, 1.82) | 0.993 | 0.64 (0.50, 0.83) | 0.013 |
| Education: >12 years | 1 (Ref) | 1 (Ref) | ||
| Poverty-income ratio | 1.05 (0.92, 1.20) | 0.698 | 1.02 (0.95, 1.11) | 0.770 |
| Food security: Full | 1 (Ref) | 1 (Ref) | ||
| Food security: Marginal | 1.54 (0.95, 2.50) | 0.271 | 0.78 (0.50, 1.22) | 0.571 |
| Food security: Low | 0.95 (0.36, 2.54) | 0.993 | NA | NA |
| Food security: Very low | 1.46 (0.60, 3.52) | 0.659 | NA | NA |
| Smoking status: Never | 1 (Ref) | 1 (Ref) | ||
| Smoking status: Current | 1.69 (1.02, 2.79) | 0.179 | 0.94 (0.70, 1.26) | 0.837 |
| Smoking status: Former | 1.06 (0.37, 3.04) | 0.993 | 1.27 (0.88, 1.83) | 0.502 |
| Physical activity: Sedentary | 1 (Ref) | 1 (Ref) | ||
| Physical activity: Moderate | 1.30 (0.50, 3.34) | 0.775 | 1.23 (0.84, 1.80) | 0.577 |
| Physical activity: Intense | 1.28 (0.57, 2.87) | 0.770 | 1.16 (0.83, 1.61) | 0.659 |
| Alcohol use: No | 1 (Ref) | 1 (Ref) | ||
| Alcohol use: Yes | 1.58 (0.68, 3.67) | 0.571 | 0.77 (0.60, 1.00) | 0.186 |
| Systolic blood pressure category: <120 | 1 (Ref) | 1 (Ref) | ||
| Systolic blood pressure category: 120–129 | 1.05 (0.56, 1.97) | 0.988 | 0.59 (0.38, 0.91) | 0.097 |
| Systolic blood pressure category: 130–139 | 1.02 (0.37, 2.78) | 0.993 | 1.05 (0.61, 1.80) | 0.988 |
| Systolic blood pressure category: 140–159 | 0.97 (0.25, 3.75) | 0.993 | 0.67 (0.30, 1.49) | 0.617 |
| Systolic blood pressure category: ≥160 | 0.00 (0.00, 0.00) | <0.001 | 0.00 (0.00, 0.00) | <0.001 |
| Diastolic blood pressure category: <80 | 1 (Ref) | 1 (Ref) | ||
| Diastolic blood pressure category: 80–89 | 1.60 (0.60, 4.27) | 0.617 | 0.98 (0.65, 1.49) | 0.993 |
| Diastolic blood pressure category: 90–99 | 0.44 (0.10, 1.99) | 0.571 | 1.60 (0.89, 2.86) | 0.316 |
| Diastolic blood pressure category: ≥100 | 0.00 (0.00, 0.00) | <0.001 | 1.69 (0.42, 6.86) | 0.717 |
| BMI category: Underweight (<18.5) | 0.39 (0.07, 2.21) | 0.571 | 1.00 (0.35, 2.80) | 0.993 |
| BMI category: Normal (18.5–24.9) | 1 (Ref) | 1 (Ref) | ||
| BMI category: Overweight (25–29.9) | 0.51 (0.30, 0.86) | 0.097 | 0.94 (0.68, 1.30) | 0.847 |
| BMI category: Obesity I (30–34.9) | 0.81 (0.41, 1.60) | 0.755 | 1.04 (0.71, 1.52) | 0.951 |
| BMI category: Obesity II (35–39.9) | 0.81 (0.33, 2.00) | 0.801 | 0.98 (0.56, 1.69) | 0.993 |
| BMI category: Obesity III (≥40) | 2.22 (1.24, 3.97) | 0.088 | 1.20 (0.76, 1.90) | 0.698 |
| Waist-to-height ratio category: <0.5 | 1 (Ref) | 1 (Ref) | ||
| Waist-to-height ratio category: 0.5–0.59 | 0.81 (0.44, 1.49) | 0.725 | 0.87 (0.62, 1.20) | 0.659 |
| Waist-to-height ratio category: ≥0.6 | 1.22 (0.72, 2.09) | 0.698 | 1.04 (0.73, 1.49) | 0.951 |
| HDL cholesterol (mg/dL) | 1.00 (0.99, 1.01) | 0.725 | 1.00 (0.99, 1.01) | 0.775 |
| Non-HDL cholesterol (mg/dL) | 1.00 (0.99, 1.01) | 0.770 | 1.00 (0.99, 1.00) | 0.329 |
| Total cholesterol (mg/dL) | 1.00 (0.99, 1.01) | 0.775 | 1.00 (0.99, 1.00) | 0.316 |
| HbA1c (%) | 0.71 (0.38, 1.32) | 0.571 | 0.92 (0.76, 1.11) | 0.659 |
| Vitamin D category: <20 (deficient) | 1.54 (0.13, 17.72) | 0.847 | 2.04 (0.92, 4.51) | 0.271 |
| Vitamin D category: 20–29 (insufficient) | 0.79 (0.34, 1.85) | 0.775 | 1.65 (1.11, 2.45) | 0.097 |
| Vitamin D category: ≥30 (sufficient) | 1 (Ref) | 1 (Ref) | ||
| Vitamin D category trend | 1.14 (0.59, 2.17) | 0.837 | 0.66 (0.48, 0.91) | 0.088 |
| DII | 1.07 (0.96, 1.19) | 0.560 | 0.95 (0.91, 0.99) | 0.097 |
| DII quartile trend | 1.12 (0.85, 1.47) | 0.698 | 0.93 (0.84, 1.03) | 0.397 |
| DII including alcohol | 1.06 (0.94, 1.19) | 0.617 | 0.95 (0.92, 0.99) | 0.117 |
| 1 p values were corrected for multiple comparisons using the Benjamini-Hochberg false discovery rate (FDR) method | ||||
tbl_uni2_primary| Table 7: Univariable Associations of Individual Nutritional Covariates With Asthma, Allergic Rhinitis, Inflammatory Arthritis, Diabetes, and Hypertension | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Variable | Asthma OR (95% CI) | Asthma p value1 | Allergic rhinitis OR (95% CI) | Allergic rhinitis p value1 | Inflammatory arthritis OR (95% CI) | Inflammatory arthritis p value1 | Diabetes OR (95% CI) | Diabetes p value1 | Hypertension OR (95% CI) | Hypertension p value1 |
| Total energy intake (kcal): Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Total energy intake (kcal): Q2 | 0.92 (0.75, 1.14) | 0.781 | 1.27 (1.00, 1.62) | 0.366 | 0.37 (0.19, 0.73) | 0.100 | 0.83 (0.51, 1.35) | 0.781 | 1.16 (0.89, 1.53) | 0.646 |
| Total energy intake (kcal): Q3 | 0.74 (0.59, 0.93) | 0.167 | 1.05 (0.85, 1.30) | 0.856 | 0.31 (0.16, 0.60) | 0.047 | 0.78 (0.48, 1.27) | 0.655 | 1.08 (0.85, 1.39) | 0.821 |
| Total energy intake (kcal): Q4 | 1.01 (0.80, 1.26) | 0.968 | 1.22 (0.95, 1.57) | 0.524 | 0.51 (0.25, 1.01) | 0.373 | 0.66 (0.36, 1.19) | 0.555 | 1.28 (1.01, 1.61) | 0.308 |
| Carbohydrate intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Carbohydrate intake: Q2 | 0.92 (0.72, 1.18) | 0.821 | 1.35 (1.06, 1.74) | 0.213 | 0.76 (0.40, 1.45) | 0.750 | 0.63 (0.39, 1.00) | 0.371 | 1.14 (0.89, 1.46) | 0.663 |
| Carbohydrate intake: Q3 | 0.83 (0.67, 1.03) | 0.483 | 1.18 (0.92, 1.51) | 0.574 | 0.26 (0.13, 0.49) | 0.019 | 0.44 (0.25, 0.80) | 0.133 | 0.99 (0.78, 1.27) | 0.971 |
| Carbohydrate intake: Q4 | 1.04 (0.82, 1.32) | 0.883 | 1.13 (0.87, 1.47) | 0.703 | 0.73 (0.37, 1.45) | 0.725 | 0.37 (0.21, 0.68) | 0.062 | 1.12 (0.90, 1.39) | 0.653 |
| Protein intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Protein intake: Q2 | 0.84 (0.67, 1.06) | 0.531 | 1.15 (0.87, 1.53) | 0.665 | 0.49 (0.27, 0.90) | 0.237 | 1.23 (0.75, 2.01) | 0.763 | 0.99 (0.79, 1.25) | 0.968 |
| Protein intake: Q3 | 0.89 (0.70, 1.11) | 0.653 | 1.23 (0.96, 1.59) | 0.511 | 0.54 (0.28, 1.02) | 0.382 | 1.22 (0.72, 2.07) | 0.781 | 1.14 (0.90, 1.44) | 0.653 |
| Protein intake: Q4 | 0.80 (0.64, 1.02) | 0.419 | 1.24 (0.94, 1.63) | 0.524 | 0.50 (0.26, 0.94) | 0.291 | 1.15 (0.74, 1.78) | 0.821 | 1.18 (0.95, 1.46) | 0.524 |
| Total fat intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Total fat intake: Q2 | 1.04 (0.81, 1.34) | 0.883 | 1.03 (0.79, 1.35) | 0.900 | 0.48 (0.24, 0.95) | 0.295 | 0.69 (0.41, 1.16) | 0.555 | 0.89 (0.67, 1.17) | 0.750 |
| Total fat intake: Q3 | 0.96 (0.74, 1.25) | 0.883 | 1.09 (0.87, 1.35) | 0.781 | 0.51 (0.30, 0.87) | 0.208 | 0.77 (0.43, 1.38) | 0.727 | 1.11 (0.89, 1.39) | 0.714 |
| Total fat intake: Q4 | 1.06 (0.83, 1.35) | 0.856 | 1.20 (0.94, 1.53) | 0.524 | 0.60 (0.29, 1.21) | 0.534 | 1.23 (0.73, 2.07) | 0.781 | 1.30 (1.02, 1.66) | 0.295 |
| Saturated fat intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Saturated fat intake: Q2 | 0.94 (0.77, 1.14) | 0.816 | 1.09 (0.86, 1.39) | 0.781 | 0.76 (0.38, 1.54) | 0.781 | 0.73 (0.43, 1.23) | 0.610 | 1.01 (0.78, 1.31) | 0.971 |
| Saturated fat intake: Q3 | 0.96 (0.76, 1.23) | 0.883 | 1.12 (0.88, 1.42) | 0.703 | 0.64 (0.33, 1.25) | 0.579 | 0.80 (0.50, 1.28) | 0.703 | 1.01 (0.79, 1.29) | 0.968 |
| Saturated fat intake: Q4 | 0.97 (0.76, 1.24) | 0.898 | 1.21 (0.94, 1.56) | 0.531 | 0.82 (0.42, 1.57) | 0.821 | 0.99 (0.59, 1.66) | 0.971 | 1.29 (1.04, 1.60) | 0.218 |
| Polyunsaturated fat intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Polyunsaturated fat intake: Q2 | 0.94 (0.74, 1.20) | 0.856 | 0.96 (0.75, 1.23) | 0.877 | 0.73 (0.36, 1.48) | 0.740 | 0.86 (0.47, 1.56) | 0.852 | 0.94 (0.77, 1.16) | 0.844 |
| Polyunsaturated fat intake: Q3 | 0.97 (0.79, 1.19) | 0.883 | 0.90 (0.70, 1.16) | 0.750 | 0.86 (0.39, 1.90) | 0.877 | 0.66 (0.39, 1.11) | 0.524 | 0.97 (0.77, 1.21) | 0.883 |
| Polyunsaturated fat intake: Q4 | 1.16 (0.95, 1.43) | 0.531 | 0.96 (0.77, 1.19) | 0.875 | 1.73 (0.86, 3.51) | 0.524 | 0.89 (0.53, 1.50) | 0.860 | 0.85 (0.67, 1.08) | 0.569 |
| Monounsaturated fat intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Monounsaturated fat intake: Q2 | 0.88 (0.72, 1.09) | 0.610 | 0.84 (0.65, 1.09) | 0.578 | 0.64 (0.32, 1.29) | 0.600 | 0.62 (0.35, 1.11) | 0.511 | 0.87 (0.68, 1.10) | 0.614 |
| Monounsaturated fat intake: Q3 | 1.08 (0.89, 1.31) | 0.781 | 0.86 (0.66, 1.12) | 0.638 | 0.71 (0.39, 1.32) | 0.646 | 0.49 (0.28, 0.83) | 0.148 | 0.68 (0.53, 0.87) | 0.082 |
| Monounsaturated fat intake: Q4 | 1.03 (0.82, 1.29) | 0.898 | 0.83 (0.64, 1.06) | 0.524 | 1.58 (0.84, 2.95) | 0.531 | 0.71 (0.42, 1.20) | 0.586 | 0.81 (0.65, 1.01) | 0.407 |
| Omega-3 fat intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Omega-3 fat intake: Q2 | 0.98 (0.79, 1.21) | 0.900 | 1.07 (0.86, 1.32) | 0.834 | 1.61 (0.73, 3.54) | 0.610 | 1.10 (0.61, 1.97) | 0.883 | 0.95 (0.77, 1.16) | 0.852 |
| Omega-3 fat intake: Q3 | 0.90 (0.74, 1.10) | 0.653 | 0.98 (0.78, 1.23) | 0.916 | 0.76 (0.32, 1.83) | 0.821 | 1.26 (0.70, 2.28) | 0.781 | 1.02 (0.85, 1.24) | 0.898 |
| Omega-3 fat intake: Q4 | 1.11 (0.92, 1.35) | 0.646 | 0.86 (0.67, 1.11) | 0.614 | 2.88 (1.29, 6.41) | 0.157 | 1.46 (0.89, 2.38) | 0.524 | 0.96 (0.76, 1.21) | 0.877 |
| Omega-6 fat intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Omega-6 fat intake: Q2 | 0.94 (0.73, 1.21) | 0.856 | 0.97 (0.75, 1.24) | 0.892 | 0.81 (0.38, 1.75) | 0.852 | 0.82 (0.45, 1.49) | 0.821 | 0.84 (0.67, 1.04) | 0.524 |
| Omega-6 fat intake: Q3 | 0.98 (0.78, 1.22) | 0.913 | 0.86 (0.68, 1.09) | 0.600 | 0.95 (0.46, 1.98) | 0.943 | 0.68 (0.41, 1.14) | 0.531 | 0.90 (0.71, 1.14) | 0.725 |
| Omega-6 fat intake: Q4 | 1.13 (0.91, 1.40) | 0.647 | 0.99 (0.79, 1.23) | 0.957 | 1.69 (0.85, 3.37) | 0.524 | 0.86 (0.52, 1.43) | 0.834 | 0.84 (0.67, 1.06) | 0.527 |
| Cholesterol intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Cholesterol intake: Q2 | 0.99 (0.78, 1.26) | 0.968 | 1.15 (0.89, 1.48) | 0.653 | 0.71 (0.39, 1.29) | 0.634 | 0.86 (0.44, 1.68) | 0.861 | 0.96 (0.74, 1.26) | 0.893 |
| Cholesterol intake: Q3 | 0.83 (0.67, 1.03) | 0.483 | 0.97 (0.78, 1.22) | 0.898 | 0.61 (0.31, 1.20) | 0.531 | 1.26 (0.73, 2.18) | 0.748 | 1.19 (0.95, 1.50) | 0.524 |
| Cholesterol intake: Q4 | 0.81 (0.63, 1.03) | 0.483 | 1.07 (0.82, 1.39) | 0.852 | 0.63 (0.33, 1.22) | 0.555 | 1.30 (0.75, 2.27) | 0.703 | 1.39 (1.11, 1.73) | 0.095 |
| Vitamin A intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Vitamin A intake: Q2 | 0.92 (0.74, 1.16) | 0.808 | 0.82 (0.65, 1.04) | 0.511 | 0.91 (0.44, 1.87) | 0.896 | 1.47 (0.84, 2.57) | 0.569 | 1.08 (0.83, 1.40) | 0.842 |
| Vitamin A intake: Q3 | 0.80 (0.65, 1.00) | 0.373 | 0.72 (0.57, 0.91) | 0.125 | 0.90 (0.45, 1.82) | 0.886 | 1.91 (1.13, 3.26) | 0.213 | 1.16 (0.88, 1.54) | 0.653 |
| Vitamin A intake: Q4 | 1.05 (0.84, 1.31) | 0.856 | 0.60 (0.46, 0.77) | 0.019 | 1.22 (0.58, 2.55) | 0.852 | 1.66 (0.86, 3.19) | 0.524 | 1.23 (0.95, 1.60) | 0.524 |
| Beta-carotene intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Beta-carotene intake: Q2 | 1.02 (0.83, 1.26) | 0.902 | 0.82 (0.64, 1.05) | 0.524 | 0.76 (0.39, 1.45) | 0.750 | 1.60 (0.93, 2.75) | 0.481 | 1.08 (0.85, 1.38) | 0.821 |
| Beta-carotene intake: Q3 | 1.06 (0.83, 1.36) | 0.856 | 0.65 (0.51, 0.83) | 0.047 | 1.47 (0.72, 3.02) | 0.653 | 1.54 (0.85, 2.78) | 0.531 | 1.25 (0.97, 1.61) | 0.477 |
| Beta-carotene intake: Q4 | 1.11 (0.89, 1.39) | 0.703 | 0.70 (0.56, 0.88) | 0.081 | 2.05 (1.05, 3.99) | 0.295 | 1.41 (0.82, 2.41) | 0.596 | 1.34 (1.02, 1.76) | 0.295 |
| Vitamin C intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Vitamin C intake: Q2 | 1.17 (0.92, 1.49) | 0.590 | 1.27 (1.03, 1.56) | 0.250 | 1.20 (0.52, 2.80) | 0.861 | 1.44 (0.89, 2.33) | 0.528 | 1.09 (0.83, 1.44) | 0.821 |
| Vitamin C intake: Q3 | 1.23 (1.02, 1.48) | 0.291 | 0.93 (0.69, 1.26) | 0.856 | 1.92 (0.99, 3.74) | 0.373 | 1.06 (0.60, 1.88) | 0.900 | 1.14 (0.86, 1.51) | 0.713 |
| Vitamin C intake: Q4 | 1.23 (1.00, 1.51) | 0.371 | 0.90 (0.68, 1.19) | 0.781 | 2.46 (1.12, 5.40) | 0.255 | 1.18 (0.69, 2.00) | 0.821 | 1.31 (0.98, 1.75) | 0.429 |
| Vitamin E intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Vitamin E intake: Q2 | 0.82 (0.64, 1.05) | 0.524 | 0.97 (0.78, 1.21) | 0.898 | 0.93 (0.50, 1.73) | 0.900 | 0.56 (0.31, 1.01) | 0.373 | 1.08 (0.80, 1.46) | 0.852 |
| Vitamin E intake: Q3 | 0.94 (0.74, 1.20) | 0.856 | 0.82 (0.65, 1.03) | 0.483 | 1.01 (0.53, 1.94) | 0.981 | 0.83 (0.46, 1.50) | 0.821 | 1.10 (0.85, 1.42) | 0.781 |
| Vitamin E intake: Q4 | 0.95 (0.76, 1.20) | 0.871 | 0.67 (0.54, 0.82) | 0.024 | 1.53 (0.77, 3.04) | 0.602 | 1.10 (0.69, 1.77) | 0.868 | 1.15 (0.89, 1.48) | 0.646 |
| Vitamin B6 intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Vitamin B6 intake: Q2 | 0.93 (0.78, 1.11) | 0.745 | 0.97 (0.77, 1.22) | 0.898 | 0.77 (0.35, 1.67) | 0.811 | 1.18 (0.70, 1.99) | 0.821 | 0.88 (0.68, 1.13) | 0.653 |
| Vitamin B6 intake: Q3 | 0.96 (0.78, 1.18) | 0.877 | 0.90 (0.70, 1.16) | 0.759 | 1.51 (0.78, 2.89) | 0.600 | 1.24 (0.77, 1.99) | 0.727 | 1.04 (0.84, 1.29) | 0.875 |
| Vitamin B6 intake: Q4 | 1.10 (0.89, 1.36) | 0.725 | 0.82 (0.66, 1.03) | 0.477 | 2.27 (1.13, 4.56) | 0.242 | 1.17 (0.65, 2.08) | 0.852 | 1.15 (0.91, 1.45) | 0.610 |
| Vitamin B12 intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Vitamin B12 intake: Q2 | 1.02 (0.82, 1.27) | 0.916 | 1.11 (0.90, 1.38) | 0.686 | 0.87 (0.47, 1.59) | 0.856 | 0.74 (0.44, 1.22) | 0.610 | 1.01 (0.75, 1.35) | 0.974 |
| Vitamin B12 intake: Q3 | 0.87 (0.69, 1.09) | 0.600 | 1.03 (0.78, 1.36) | 0.898 | 1.07 (0.59, 1.93) | 0.898 | 0.67 (0.40, 1.12) | 0.524 | 0.93 (0.72, 1.19) | 0.821 |
| Vitamin B12 intake: Q4 | 1.03 (0.83, 1.28) | 0.898 | 1.20 (0.94, 1.52) | 0.524 | 0.59 (0.27, 1.32) | 0.585 | 0.76 (0.42, 1.36) | 0.703 | 1.15 (0.90, 1.48) | 0.646 |
| Folic acid intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Folic acid intake: Q2 | 1.28 (1.03, 1.58) | 0.250 | 0.92 (0.72, 1.16) | 0.787 | 0.96 (0.46, 2.01) | 0.957 | 0.80 (0.51, 1.23) | 0.653 | 1.20 (0.92, 1.57) | 0.559 |
| Folic acid intake: Q3 | 1.05 (0.85, 1.30) | 0.856 | 0.91 (0.72, 1.15) | 0.781 | 1.33 (0.63, 2.82) | 0.781 | 1.22 (0.72, 2.09) | 0.781 | 1.21 (0.96, 1.53) | 0.509 |
| Folic acid intake: Q4 | 1.23 (0.97, 1.55) | 0.477 | 0.98 (0.77, 1.24) | 0.907 | 1.61 (0.79, 3.31) | 0.579 | 1.23 (0.70, 2.15) | 0.786 | 1.44 (1.16, 1.80) | 0.058 |
| Thiamin intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Thiamin intake: Q2 | 0.96 (0.75, 1.23) | 0.883 | 0.93 (0.73, 1.19) | 0.842 | 1.35 (0.58, 3.13) | 0.793 | 1.03 (0.67, 1.60) | 0.935 | 1.04 (0.84, 1.28) | 0.877 |
| Thiamin intake: Q3 | 0.87 (0.70, 1.08) | 0.589 | 0.96 (0.76, 1.20) | 0.877 | 1.21 (0.54, 2.68) | 0.856 | 0.96 (0.57, 1.61) | 0.918 | 1.07 (0.86, 1.32) | 0.821 |
| Thiamin intake: Q4 | 1.17 (0.92, 1.49) | 0.585 | 0.81 (0.64, 1.03) | 0.477 | 2.69 (1.20, 6.01) | 0.213 | 1.12 (0.62, 2.03) | 0.875 | 1.05 (0.85, 1.30) | 0.860 |
| Riboflavin intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Riboflavin intake: Q2 | 0.82 (0.67, 1.01) | 0.376 | 0.85 (0.67, 1.07) | 0.555 | 1.04 (0.51, 2.13) | 0.957 | 1.05 (0.69, 1.60) | 0.898 | 1.05 (0.85, 1.30) | 0.859 |
| Riboflavin intake: Q3 | 0.96 (0.75, 1.22) | 0.877 | 0.90 (0.73, 1.10) | 0.653 | 1.28 (0.63, 2.61) | 0.811 | 1.03 (0.57, 1.86) | 0.958 | 1.04 (0.81, 1.34) | 0.883 |
| Riboflavin intake: Q4 | 1.03 (0.84, 1.27) | 0.892 | 0.61 (0.46, 0.80) | 0.047 | 1.60 (0.80, 3.19) | 0.571 | 1.10 (0.64, 1.88) | 0.877 | 1.05 (0.80, 1.37) | 0.883 |
| Niacin intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Niacin intake: Q2 | 1.05 (0.83, 1.33) | 0.860 | 1.08 (0.85, 1.37) | 0.821 | 0.82 (0.38, 1.78) | 0.856 | 1.58 (0.93, 2.69) | 0.483 | 1.07 (0.83, 1.37) | 0.852 |
| Niacin intake: Q3 | 1.02 (0.82, 1.26) | 0.935 | 1.16 (0.93, 1.45) | 0.571 | 1.50 (0.76, 2.94) | 0.614 | 1.42 (0.78, 2.58) | 0.614 | 1.03 (0.85, 1.24) | 0.893 |
| Niacin intake: Q4 | 1.16 (0.94, 1.43) | 0.544 | 0.94 (0.73, 1.20) | 0.852 | 1.41 (0.74, 2.68) | 0.653 | 1.11 (0.63, 1.96) | 0.877 | 1.02 (0.83, 1.24) | 0.916 |
| Iron intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Iron intake: Q2 | 0.85 (0.70, 1.03) | 0.483 | 1.01 (0.80, 1.28) | 0.957 | 0.73 (0.43, 1.24) | 0.614 | 1.00 (0.58, 1.73) | 0.996 | 1.04 (0.83, 1.31) | 0.877 |
| Iron intake: Q3 | 0.88 (0.71, 1.10) | 0.638 | 1.13 (0.88, 1.44) | 0.686 | 0.44 (0.22, 0.90) | 0.255 | 0.78 (0.43, 1.41) | 0.759 | 1.01 (0.82, 1.25) | 0.962 |
| Iron intake: Q4 | 0.84 (0.67, 1.05) | 0.524 | 1.20 (0.93, 1.56) | 0.544 | 0.57 (0.28, 1.19) | 0.524 | 0.89 (0.48, 1.65) | 0.877 | 0.92 (0.74, 1.15) | 0.787 |
| Magnesium intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Magnesium intake: Q2 | 0.91 (0.72, 1.16) | 0.781 | 0.95 (0.72, 1.26) | 0.877 | 1.28 (0.54, 3.05) | 0.842 | 0.85 (0.43, 1.69) | 0.856 | 0.96 (0.74, 1.25) | 0.883 |
| Magnesium intake: Q3 | 0.94 (0.75, 1.17) | 0.844 | 0.87 (0.69, 1.09) | 0.600 | 1.79 (0.83, 3.88) | 0.524 | 1.16 (0.69, 1.95) | 0.844 | 0.95 (0.76, 1.19) | 0.868 |
| Magnesium intake: Q4 | 1.16 (0.93, 1.44) | 0.571 | 0.72 (0.55, 0.94) | 0.213 | 2.88 (1.34, 6.18) | 0.131 | 1.23 (0.70, 2.14) | 0.786 | 1.04 (0.80, 1.36) | 0.883 |
| Zinc intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Zinc intake: Q2 | 0.94 (0.75, 1.18) | 0.852 | 1.03 (0.82, 1.30) | 0.883 | 0.96 (0.46, 2.01) | 0.957 | 0.75 (0.39, 1.45) | 0.741 | 0.66 (0.51, 0.86) | 0.082 |
| Zinc intake: Q3 | 1.14 (0.90, 1.43) | 0.646 | 1.08 (0.85, 1.36) | 0.821 | 0.80 (0.38, 1.68) | 0.826 | 0.89 (0.45, 1.75) | 0.877 | 0.81 (0.67, 0.99) | 0.295 |
| Zinc intake: Q4 | 1.09 (0.87, 1.37) | 0.781 | 0.91 (0.70, 1.18) | 0.781 | 1.53 (0.77, 3.02) | 0.602 | 1.13 (0.66, 1.94) | 0.859 | 0.88 (0.71, 1.09) | 0.608 |
| Selenium intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Selenium intake: Q2 | 1.23 (0.99, 1.53) | 0.391 | 1.17 (0.89, 1.52) | 0.632 | 1.17 (0.54, 2.54) | 0.875 | 1.37 (0.79, 2.37) | 0.634 | 1.05 (0.82, 1.33) | 0.877 |
| Selenium intake: Q3 | 1.10 (0.89, 1.36) | 0.725 | 0.88 (0.70, 1.11) | 0.653 | 1.05 (0.51, 2.18) | 0.935 | 1.47 (0.89, 2.42) | 0.524 | 0.97 (0.78, 1.20) | 0.883 |
| Selenium intake: Q4 | 1.39 (1.11, 1.73) | 0.100 | 0.97 (0.77, 1.23) | 0.896 | 2.11 (1.07, 4.14) | 0.291 | 1.09 (0.63, 1.87) | 0.883 | 0.87 (0.69, 1.11) | 0.632 |
| Fiber intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Fiber intake: Q2 | 0.95 (0.75, 1.20) | 0.870 | 0.92 (0.72, 1.19) | 0.821 | 1.25 (0.50, 3.18) | 0.856 | 1.00 (0.58, 1.71) | 0.990 | 1.18 (0.92, 1.53) | 0.579 |
| Fiber intake: Q3 | 1.00 (0.80, 1.27) | 0.978 | 0.80 (0.65, 0.98) | 0.295 | 1.23 (0.52, 2.91) | 0.856 | 1.17 (0.67, 2.05) | 0.842 | 1.09 (0.85, 1.41) | 0.811 |
| Fiber intake: Q4 | 1.18 (0.92, 1.53) | 0.579 | 0.69 (0.54, 0.88) | 0.094 | 3.02 (1.32, 6.94) | 0.156 | 1.19 (0.68, 2.09) | 0.821 | 1.45 (1.16, 1.82) | 0.058 |
| Caffeine intake: Q1 | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | 1 (Ref) | |||||
| Caffeine intake: Q2 | 0.78 (0.64, 0.96) | 0.213 | 0.83 (0.66, 1.04) | 0.509 | 0.38 (0.19, 0.75) | 0.125 | 0.87 (0.48, 1.57) | 0.856 | 0.85 (0.65, 1.11) | 0.603 |
| Caffeine intake: Q3 | 0.72 (0.58, 0.89) | 0.081 | 0.47 (0.37, 0.60) | <0.001 | 0.73 (0.41, 1.29) | 0.646 | 0.98 (0.59, 1.63) | 0.963 | 0.88 (0.65, 1.18) | 0.734 |
| Caffeine intake: Q4 | 0.95 (0.76, 1.18) | 0.856 | 0.64 (0.50, 0.83) | 0.047 | 0.46 (0.22, 0.96) | 0.295 | 1.21 (0.72, 2.04) | 0.786 | 0.88 (0.69, 1.12) | 0.653 |
| 1 p values were corrected for multiple comparisons using the Benjamini-Hochberg false discovery rate (FDR) method | ||||||||||
tbl_uni2_secondary| Table 8: Univariable Associations of Individual Nutritional Covariates With Eczema and Food Allergy | ||||
|---|---|---|---|---|
| Variable | Atopic dermatitis OR (95% CI) | Atopic dermatitis p value1 | Food allergy OR (95% CI) | Food allergy p value1 |
| Total energy intake (kcal): Q1 | 1 (Ref) | 1 (Ref) | ||
| Total energy intake (kcal): Q2 | 1.34 (0.71, 2.52) | 0.905 | 1.09 (0.69, 1.71) | 0.921 |
| Total energy intake (kcal): Q3 | 0.85 (0.43, 1.67) | 0.921 | 1.05 (0.74, 1.49) | 0.921 |
| Total energy intake (kcal): Q4 | 0.78 (0.45, 1.36) | 0.905 | 1.04 (0.69, 1.57) | 0.939 |
| Carbohydrate intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Carbohydrate intake: Q2 | 1.42 (0.57, 3.51) | 0.909 | 1.13 (0.76, 1.66) | 0.921 |
| Carbohydrate intake: Q3 | 1.36 (0.53, 3.47) | 0.921 | 1.00 (0.63, 1.57) | 0.990 |
| Carbohydrate intake: Q4 | 0.84 (0.51, 1.39) | 0.921 | 1.16 (0.79, 1.71) | 0.909 |
| Protein intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Protein intake: Q2 | 0.95 (0.45, 2.04) | 0.942 | 1.28 (0.85, 1.92) | 0.892 |
| Protein intake: Q3 | 0.76 (0.33, 1.72) | 0.921 | 1.42 (0.98, 2.05) | 0.727 |
| Protein intake: Q4 | 0.67 (0.36, 1.25) | 0.892 | 0.92 (0.61, 1.38) | 0.921 |
| Total fat intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Total fat intake: Q2 | 0.74 (0.30, 1.85) | 0.921 | 1.11 (0.71, 1.75) | 0.921 |
| Total fat intake: Q3 | 0.93 (0.44, 1.93) | 0.937 | 1.08 (0.68, 1.72) | 0.921 |
| Total fat intake: Q4 | 0.62 (0.38, 1.04) | 0.727 | 1.11 (0.76, 1.60) | 0.921 |
| Saturated fat intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Saturated fat intake: Q2 | 0.77 (0.39, 1.52) | 0.909 | 1.04 (0.76, 1.43) | 0.921 |
| Saturated fat intake: Q3 | 0.87 (0.47, 1.62) | 0.921 | 1.17 (0.76, 1.80) | 0.921 |
| Saturated fat intake: Q4 | 0.74 (0.42, 1.30) | 0.905 | 0.86 (0.59, 1.26) | 0.909 |
| Polyunsaturated fat intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Polyunsaturated fat intake: Q2 | 0.65 (0.31, 1.34) | 0.892 | 0.85 (0.60, 1.21) | 0.905 |
| Polyunsaturated fat intake: Q3 | 1.05 (0.51, 2.16) | 0.942 | 0.91 (0.64, 1.31) | 0.921 |
| Polyunsaturated fat intake: Q4 | 1.16 (0.67, 1.99) | 0.921 | 0.71 (0.53, 0.96) | 0.641 |
| Monounsaturated fat intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Monounsaturated fat intake: Q2 | 1.52 (0.73, 3.16) | 0.892 | 1.09 (0.70, 1.69) | 0.921 |
| Monounsaturated fat intake: Q3 | 1.50 (0.75, 2.97) | 0.892 | 1.16 (0.76, 1.76) | 0.921 |
| Monounsaturated fat intake: Q4 | 1.51 (1.00, 2.29) | 0.669 | 0.98 (0.69, 1.38) | 0.942 |
| Omega-3 fat intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Omega-3 fat intake: Q2 | 0.74 (0.30, 1.85) | 0.921 | 1.18 (0.91, 1.54) | 0.892 |
| Omega-3 fat intake: Q3 | 1.33 (0.70, 2.51) | 0.905 | 0.94 (0.69, 1.30) | 0.921 |
| Omega-3 fat intake: Q4 | 1.21 (0.77, 1.90) | 0.905 | 0.93 (0.63, 1.36) | 0.921 |
| Omega-6 fat intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Omega-6 fat intake: Q2 | 0.90 (0.46, 1.75) | 0.921 | 0.77 (0.55, 1.07) | 0.802 |
| Omega-6 fat intake: Q3 | 1.24 (0.63, 2.42) | 0.921 | 0.82 (0.55, 1.20) | 0.905 |
| Omega-6 fat intake: Q4 | 1.35 (0.80, 2.29) | 0.892 | 0.72 (0.55, 0.96) | 0.641 |
| Cholesterol intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Cholesterol intake: Q2 | 0.81 (0.37, 1.73) | 0.921 | 1.34 (0.87, 2.06) | 0.892 |
| Cholesterol intake: Q3 | 0.63 (0.30, 1.35) | 0.892 | 0.95 (0.64, 1.40) | 0.921 |
| Cholesterol intake: Q4 | 0.70 (0.41, 1.19) | 0.892 | 1.11 (0.72, 1.71) | 0.921 |
| Vitamin A intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Vitamin A intake: Q2 | 0.87 (0.49, 1.55) | 0.921 | 0.86 (0.61, 1.21) | 0.907 |
| Vitamin A intake: Q3 | 1.09 (0.62, 1.91) | 0.921 | 0.89 (0.60, 1.33) | 0.921 |
| Vitamin A intake: Q4 | 1.01 (0.50, 2.01) | 0.990 | 0.95 (0.67, 1.35) | 0.921 |
| Beta-carotene intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Beta-carotene intake: Q2 | 1.40 (0.76, 2.60) | 0.905 | 0.69 (0.48, 0.99) | 0.669 |
| Beta-carotene intake: Q3 | 0.83 (0.36, 1.91) | 0.921 | 0.82 (0.66, 1.04) | 0.802 |
| Beta-carotene intake: Q4 | 1.21 (0.62, 2.35) | 0.921 | 0.73 (0.51, 1.04) | 0.727 |
| Vitamin C intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Vitamin C intake: Q2 | 1.63 (0.70, 3.83) | 0.892 | 0.75 (0.52, 1.07) | 0.802 |
| Vitamin C intake: Q3 | 1.31 (0.55, 3.11) | 0.921 | 0.66 (0.44, 0.98) | 0.669 |
| Vitamin C intake: Q4 | 1.26 (0.74, 2.15) | 0.905 | 0.80 (0.52, 1.22) | 0.905 |
| Vitamin E intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Vitamin E intake: Q2 | 0.92 (0.47, 1.81) | 0.921 | 1.02 (0.72, 1.45) | 0.942 |
| Vitamin E intake: Q3 | 1.04 (0.50, 2.18) | 0.942 | 0.94 (0.65, 1.37) | 0.921 |
| Vitamin E intake: Q4 | 0.85 (0.46, 1.56) | 0.921 | 0.71 (0.52, 0.97) | 0.641 |
| Vitamin B6 intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Vitamin B6 intake: Q2 | 0.88 (0.36, 2.14) | 0.921 | 0.98 (0.69, 1.38) | 0.942 |
| Vitamin B6 intake: Q3 | 1.50 (0.83, 2.72) | 0.892 | 1.33 (0.89, 1.99) | 0.892 |
| Vitamin B6 intake: Q4 | 1.34 (0.58, 3.08) | 0.921 | 1.08 (0.68, 1.73) | 0.921 |
| Vitamin B12 intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Vitamin B12 intake: Q2 | 0.77 (0.33, 1.83) | 0.921 | 0.93 (0.68, 1.26) | 0.921 |
| Vitamin B12 intake: Q3 | 0.64 (0.28, 1.48) | 0.905 | 0.83 (0.59, 1.19) | 0.905 |
| Vitamin B12 intake: Q4 | 0.99 (0.46, 2.13) | 0.990 | 0.61 (0.39, 0.95) | 0.641 |
| Folic acid intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Folic acid intake: Q2 | 0.66 (0.39, 1.13) | 0.802 | 1.59 (1.08, 2.35) | 0.641 |
| Folic acid intake: Q3 | 0.81 (0.45, 1.46) | 0.921 | 1.11 (0.76, 1.60) | 0.921 |
| Folic acid intake: Q4 | 1.09 (0.55, 2.16) | 0.921 | 1.47 (1.04, 2.08) | 0.641 |
| Thiamin intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Thiamin intake: Q2 | 1.12 (0.55, 2.27) | 0.921 | 0.94 (0.64, 1.39) | 0.921 |
| Thiamin intake: Q3 | 1.04 (0.68, 1.59) | 0.942 | 1.16 (0.82, 1.63) | 0.909 |
| Thiamin intake: Q4 | 1.12 (0.49, 2.59) | 0.921 | 1.04 (0.73, 1.49) | 0.937 |
| Riboflavin intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Riboflavin intake: Q2 | 1.44 (0.79, 2.62) | 0.892 | 0.90 (0.66, 1.23) | 0.921 |
| Riboflavin intake: Q3 | 0.79 (0.42, 1.49) | 0.920 | 1.05 (0.71, 1.56) | 0.921 |
| Riboflavin intake: Q4 | 1.43 (0.64, 3.22) | 0.905 | 1.00 (0.72, 1.39) | 0.996 |
| Niacin intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Niacin intake: Q2 | 1.10 (0.65, 1.86) | 0.921 | 1.26 (0.87, 1.81) | 0.892 |
| Niacin intake: Q3 | 1.18 (0.81, 1.73) | 0.905 | 1.15 (0.82, 1.64) | 0.909 |
| Niacin intake: Q4 | 1.27 (0.68, 2.36) | 0.909 | 1.05 (0.72, 1.55) | 0.921 |
| Iron intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Iron intake: Q2 | 0.98 (0.45, 2.14) | 0.981 | 1.30 (0.89, 1.88) | 0.892 |
| Iron intake: Q3 | 0.88 (0.38, 2.04) | 0.921 | 1.23 (0.94, 1.61) | 0.802 |
| Iron intake: Q4 | 0.71 (0.36, 1.41) | 0.905 | 1.06 (0.72, 1.55) | 0.921 |
| Magnesium intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Magnesium intake: Q2 | 0.75 (0.45, 1.25) | 0.892 | 0.85 (0.56, 1.27) | 0.909 |
| Magnesium intake: Q3 | 0.78 (0.49, 1.25) | 0.905 | 0.91 (0.66, 1.27) | 0.921 |
| Magnesium intake: Q4 | 1.13 (0.53, 2.39) | 0.921 | 0.68 (0.49, 0.95) | 0.641 |
| Zinc intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Zinc intake: Q2 | 0.75 (0.41, 1.37) | 0.905 | 1.41 (0.93, 2.13) | 0.802 |
| Zinc intake: Q3 | 1.03 (0.62, 1.73) | 0.942 | 1.25 (0.80, 1.95) | 0.905 |
| Zinc intake: Q4 | 1.02 (0.50, 2.07) | 0.981 | 1.28 (0.87, 1.89) | 0.892 |
| Selenium intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Selenium intake: Q2 | 1.35 (0.75, 2.44) | 0.905 | 1.02 (0.72, 1.45) | 0.942 |
| Selenium intake: Q3 | 1.69 (1.00, 2.87) | 0.669 | 0.94 (0.60, 1.47) | 0.921 |
| Selenium intake: Q4 | 1.14 (0.59, 2.23) | 0.921 | 1.02 (0.74, 1.42) | 0.942 |
| Fiber intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Fiber intake: Q2 | 1.53 (0.69, 3.36) | 0.905 | 0.71 (0.48, 1.04) | 0.727 |
| Fiber intake: Q3 | 1.21 (0.50, 2.92) | 0.921 | 0.72 (0.50, 1.03) | 0.727 |
| Fiber intake: Q4 | 1.18 (0.54, 2.55) | 0.921 | 0.66 (0.49, 0.88) | 0.641 |
| Caffeine intake: Q1 | 1 (Ref) | 1 (Ref) | ||
| Caffeine intake: Q2 | 0.94 (0.42, 2.14) | 0.942 | 0.84 (0.62, 1.13) | 0.892 |
| Caffeine intake: Q3 | 1.04 (0.52, 2.09) | 0.942 | 0.84 (0.59, 1.20) | 0.905 |
| Caffeine intake: Q4 | 1.38 (0.59, 3.25) | 0.909 | 0.90 (0.62, 1.29) | 0.921 |
| 1 p values were corrected for multiple comparisons using the Benjamini-Hochberg false discovery rate (FDR) method | ||||
Multivariable regression analyses
Survey-weighted multivariable logistic regression models were used to assess the association between DII score and each disease outcome after adjustment for covariates of interest. Three models were generated. Model 1 adjusted for age and sex. Model 2 additionally adjusted for race/ethnicity, insurance status, poverty-income ratio, and education level. Model 3 further adjusted for BMI, physical activity, and smoking status. Results are shown in Figure 2 for asthma, allergic rhinitis, inflammatory arthritis, diabetes, and hypertension and in Figure 3 for eczema and food allergy.
Overall, there were few significant associations across the adjusted models. For asthma, no statistically significant associations were detected, although Model 3 suggested a possible, non-significant protective trend (OR 0.96, 95% CI 0.91–1.00; p=0.059). For allergic rhinitis, higher DII was associated with lower odds in Model 1 (OR 0.95, 95% CI 0.91–0.99; p=0.01), but this association was not statistically significant in Model 2 (OR 0.96, 95% CI 0.92–1.00; p=0.054) and Model 3 (OR 0.97, 95% CI 0.92–1.01; p=0.113). Among cardiometabolic outcomes, higher DII was associated with increased odds of hypertension in Model 1 (OR 1.07, 95% CI 1.03–1.12; p<0.001) and Model 2 (OR 1.06, 95% CI 1.01–1.10; p=0.011). However, this association was no longer significant in Model 3 (OR 1.03, 95% CI 0.99–1.08; p=0.149). For secondary outcomes, higher DII was associated with decreased odds of food allergy only in Model 1 (OR 0.93, 95% CI 0.89–0.98; p=0.006).
# Adjusted multivariable models----------------------------------------------------------------
## Variables for models 1, 2, 3
covariates_m1 <- c("age", "sex")
covariates_m2 <- c("age", "sex", "race", "insur", "pir", "educ")
covariates_m3 <- c("age", "sex", "race", "insur", "pir", "educ", "bmi_cat", "phys_act", "smoking")
## Helper function to build regression formulas
make_dii_formula <- function(outcome_var, covariates) {
rhs <- c(covariates, "dii")
as.formula(
paste(outcome_var, "~", paste(rhs, collapse = " + "))
)
}
## Fit models for primary outcomes
fit_primary_models <- imap(
outcome_list,
~ list(
label = .x$label,
design = .x$design,
m1 = svyglm(
make_dii_formula(.x$var, covariates_m1),
design = .x$design,
family = quasibinomial()
),
m2 = svyglm(
make_dii_formula(.x$var, covariates_m2),
design = .x$design,
family = quasibinomial()
),
m3 = svyglm(
make_dii_formula(.x$var, covariates_m3),
design = .x$design,
family = quasibinomial()
)
)
)
## Fit models for secondary outcomes
fit_secondary_models <- imap(
outcome_list_secondary,
~ list(
label = .x$label,
design = .x$design,
m1 = svyglm(
make_dii_formula(.x$var, covariates_m1),
design = .x$design,
family = quasibinomial()
),
m2 = svyglm(
make_dii_formula(.x$var, covariates_m2),
design = .x$design,
family = quasibinomial()
),
m3 = svyglm(
make_dii_formula(.x$var, covariates_m3),
design = .x$design,
family = quasibinomial()
)
)
)
# Extract adjusted multivariable associations from models 1-3----------------------------------
extract_dii_from_model <- function(fit, outcome_label, model_label) {
td <- broom::tidy(fit, exponentiate = TRUE, conf.int = TRUE) |>
filter(term == "dii")
if (nrow(td) == 0) {
return(tibble(
outcome_lab = outcome_label,
model = model_label,
OR = NA_real_,
LCL = NA_real_,
UCL = NA_real_,
p = NA_real_,
`OR (95% CI)` = "NE",
`p value` = NA_character_,
is_ne = TRUE
))
}
OR <- td$estimate[1]
LCL <- td$conf.low[1]
UCL <- td$conf.high[1]
p <- td$p.value[1]
is_ne <- !is.finite(OR) | !is.finite(LCL) | !is.finite(UCL) |
OR <= 0 | LCL <= 0 | UCL <= 0 |
is.na(OR) | is.na(LCL) | is.na(UCL) |
abs(log(OR)) > 10 | abs(log(LCL)) > 10 | abs(log(UCL)) > 10
if (is_ne) {
or_ci_str <- "NE"
p_str <- NA_character_
} else {
or_ci_str <- sprintf("%.2f (%.2f, %.2f)", OR, LCL, UCL)
p_str <- scales::pvalue(p, accuracy = 0.001)
}
tibble(
outcome_lab = outcome_label,
model = model_label,
OR = ifelse(is_ne, NA_real_, OR),
LCL = ifelse(is_ne, NA_real_, LCL),
UCL = ifelse(is_ne, NA_real_, UCL),
p = ifelse(is_ne, NA_real_, p),
`OR (95% CI)` = or_ci_str,
`p value` = p_str,
is_ne = is_ne
)
}
## Primary outcomes in adjusted models 1–3
dii_primary_models_long <- map_dfr(
fit_primary_models,
function(m) {
bind_rows(
extract_dii_from_model(m$m1, m$label, "Model 1"),
extract_dii_from_model(m$m2, m$label, "Model 2"),
extract_dii_from_model(m$m3, m$label, "Model 3")
)
}
)
## Secondary outcomes in adjusted models 1–3
dii_secondary_models_long <- map_dfr(
fit_secondary_models,
function(m) {
bind_rows(
extract_dii_from_model(m$m1, m$label, "Model 1"),
extract_dii_from_model(m$m2, m$label, "Model 2"),
extract_dii_from_model(m$m3, m$label, "Model 3")
)
}
)
# Forest plots for multivariable associations--------------------------------------------------
## Helper to construct label columns for OR and p values
add_or_p_labels <- function(df) {
df |>
mutate(
or_label = if_else(`OR (95% CI)` == "NE", "NE", `OR (95% CI)`),
p_label = if_else(
is.na(`p value`),
"",
paste0("p = ", `p value`)
)
)
}
## DII associations with primary outcomes
dii_primary_plot_data <- dii_primary_models_long |>
add_or_p_labels() |>
mutate(
outcome_lab = factor(
outcome_lab,
levels = c(
"Asthma",
"Allergic rhinitis",
"Inflammatory arthritis",
"Diabetes",
"Hypertension"
)
),
# Adjust model levels (more-adjusted model on top)
model_label = factor(
model,
levels = c("Model 3", "Model 2", "Model 1")
)
)
## Compute axis limits based on observed CIs with padding
dii_primary_limits <- dii_primary_plot_data |>
filter(!is_ne) |>
summarise(
xmin = min(LCL, na.rm = TRUE),
xmax = max(UCL, na.rm = TRUE)
)
x_min_primary <- pmax(0.1, dii_primary_limits$xmin * 0.9)
x_max_primary <- pmin(2.5, dii_primary_limits$xmax * 1.1)
dii_primary_forest_plot <- dii_primary_plot_data |>
filter(!is_ne) |>
ggplot(
aes(
x = OR,
y = model_label
)
) +
geom_point() +
geom_errorbarh(aes(xmin = LCL, xmax = UCL), height = 0.15) +
geom_vline(xintercept = 1, linetype = "dashed") +
geom_text(
aes(
label = paste0(or_label, "\n", p_label),
x = pmin(UCL, x_max_primary * 0.95)
),
hjust = -0.05,
size = 3.5
) +
scale_x_continuous(
limits = c(x_min_primary, x_max_primary),
expand = expansion(mult = c(0.02, 0.1))
) +
facet_wrap(~ outcome_lab, ncol = 2) +
labs(
x = "Adjusted odds ratio for Dietary Inflammatory Index",
y = NULL,
title = "Figure 2: Adjusted Multivariable Associations Between Dietary Inflammatory Index\nand Asthma, Allergic Rhinitis, Inflammatory Arthritis, Diabetes, or Hypertension"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 13, hjust = 0.5),
strip.text = element_text(face = "bold", size = 11),
axis.text.y = element_text(face = "bold", size = 10),
axis.text.x = element_text(size = 10),
axis.title.x = element_text(face = "bold", size = 11),
panel.spacing = grid::unit(0.4, "lines")
)
## DII associations with secondary outcomes
dii_secondary_plot_data <- dii_secondary_models_long |>
mutate(
outcome_lab = recode(
outcome_lab,
"Atopic dermatitis" = "Eczema"
)
) |>
add_or_p_labels() |>
mutate(
outcome_lab = factor(
outcome_lab,
levels = c("Eczema", "Food allergy")
),
model_label = factor(
model,
levels = c("Model 3", "Model 2", "Model 1")
)
)
dii_secondary_limits <- dii_secondary_plot_data |>
filter(!is_ne) |>
summarise(
xmin = min(LCL, na.rm = TRUE),
xmax = max(UCL, na.rm = TRUE)
)
x_min_secondary <- pmax(0.1, dii_secondary_limits$xmin * 0.9)
x_max_secondary <- pmin(2.5, dii_secondary_limits$xmax * 1.1)
dii_secondary_forest_plot <- dii_secondary_plot_data |>
filter(!is_ne) |>
ggplot(
aes(
x = OR,
y = model_label
)
) +
geom_point() +
geom_errorbarh(aes(xmin = LCL, xmax = UCL), height = 0.15) +
geom_vline(xintercept = 1, linetype = "dashed") +
geom_text(
aes(
label = paste0(or_label, "\n", p_label),
x = pmin(UCL, x_max_secondary * 0.95)
),
hjust = -0.05,
size = 3.5
) +
scale_x_continuous(
limits = c(x_min_secondary, x_max_secondary),
expand = expansion(mult = c(0.02, 0.1))
) +
facet_wrap(~ outcome_lab, ncol = 1) +
labs(
x = "Adjusted odds ratio for Dietary Inflammatory Index",
y = NULL,
title = "Figure 3: Adjusted Multivariable Associations Between Dietary Inflammatory Index\nand Eczema or Food Allergy",
caption = "Note: Model 3 was not estimatable for eczema because of sparse outcome counts."
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 13, hjust = 0.5),
strip.text = element_text(face = "bold", size = 11),
axis.text.y = element_text(face = "bold", size = 10),
axis.text.x = element_text(size = 10),
axis.title.x = element_text(face = "bold", size = 11),
panel.spacing = grid::unit(0.4, "lines")
)
# Print forest plots---------------------------------------------------------------------------
dii_primary_forest_plotdii_secondary_forest_plotEffect modification analyses
Effect modification analyses were performed to assess whether the association between DII and each disease outcome differed by sex, BMI category, or vitamin D status. Results are shown in Table 9 for asthma, allergic rhinitis, inflammatory arthritis, diabetes, and hypertension, and in Table 10 for eczema and food allergy. Overall, only a few interactions were statistically significant and generally without a clear trend across categories. For inflammatory arthritis, BMI significantly modified the association (p<0.001), but without a clear trend across BMI categories. For diabetes, vitamin D status significantly modified the association (p<0.001), with those with vitamin D deficiency exhibiting a higher association between DII and diabetes (OR 1.70, 95% CI 1.13–2.57), while those with vitamin D insufficiency exhibiting a reduced association (OR 0.81, 95% CI 0.65–0.99). For secondary outcomes, eczema demonstrated a significant interaction with BMI (p=0.012), with an increased association seen among those who were underweight (OR 2.20, 95% CI 1.25–3.88). No significant interaction terms were observed for sex.
# Effect modification analyses-----------------------------------------------------------------
effectmod_modifiers <- tibble(
modifier_var = c("sex", "bmi_cat", "vitD25oh_cat"),
modifier_label = c("Sex", "BMI category", "Vitamin D category")
)
fit_dii_interaction_model <- function(outcome_info, covariates_base, modifier_var) {
outcome_var <- outcome_info$var
base_covars <- covariates_base
if (!(modifier_var %in% base_covars)) {
base_covars <- c(base_covars, modifier_var)
}
rhs_terms <- c("dii", modifier_var, paste0("dii:", modifier_var), base_covars)
rhs_terms <- unique(rhs_terms)
f <- as.formula(
paste(
outcome_var,
"~",
paste(rhs_terms, collapse = " + ")
)
)
svyglm(
f,
design = outcome_info$design,
family = quasibinomial()
)
}
run_dii_effectmod <- function(fit, outcome_label, modifier_var, modifier_label) {
cf <- coef(fit)
V <- vcov(fit)
mf <- model.frame(fit)
mod <- mf[[modifier_var]]
if (!is.factor(mod)) {
mod <- factor(mod)
}
levels_mod <- levels(mod)
int_pattern1 <- paste0("^dii:", modifier_var)
int_pattern2 <- paste0("^", modifier_var, ".*:dii")
interaction_terms <- names(cf)[
grepl(int_pattern1, names(cf)) | grepl(int_pattern2, names(cf))
]
if (length(interaction_terms) > 0) {
beta_int <- cf[interaction_terms]
V_int <- V[interaction_terms, interaction_terms, drop = FALSE]
stat <- as.numeric(t(beta_int) %*% solve(V_int) %*% beta_int)
df <- length(beta_int)
p_int <- 1 - pchisq(stat, df = df)
} else {
p_int <- NA_real_
}
map_dfr(
seq_along(levels_mod),
function(j) {
lvl <- levels_mod[j]
if (!("dii" %in% names(cf))) {
return(tibble())
}
beta_dii <- cf[["dii"]]
var_dii <- V["dii", "dii"]
if (j == 1L) {
beta <- beta_dii
var <- var_dii
} else {
int_name_candidates <- c(
paste0("dii:", modifier_var, lvl),
paste0(modifier_var, lvl, ":dii")
)
int_name <- int_name_candidates[int_name_candidates %in% names(cf)]
if (length(int_name) == 0L) {
return(tibble())
}
int_name <- int_name[1]
beta_int <- cf[[int_name]]
var_int <- V[int_name, int_name]
covar <- V["dii", int_name]
beta <- beta_dii + beta_int
var <- var_dii + var_int + 2 * covar
}
se <- sqrt(var)
OR <- exp(beta)
LCL <- exp(beta - 1.96 * se)
UCL <- exp(beta + 1.96 * se)
is_ne <- !is.finite(OR) | !is.finite(LCL) | !is.finite(UCL) |
OR <= 0 | LCL <= 0 | UCL <= 0 |
is.na(OR) | is.na(LCL) | is.na(UCL) |
abs(log(OR)) > 10 | abs(log(LCL)) > 10 | abs(log(UCL)) > 10
if (is_ne) {
or_ci_str <- "NE"
} else {
or_ci_str <- sprintf("%.2f (%.2f, %.2f)", OR, LCL, UCL)
}
tibble(
outcome_lab = outcome_label,
modifier_var = modifier_var,
modifier_label = modifier_label,
level = lvl,
OR = ifelse(is_ne, NA_real_, OR),
LCL = ifelse(is_ne, NA_real_, LCL),
UCL = ifelse(is_ne, NA_real_, UCL),
is_ne = is_ne,
`OR (95% CI)` = or_ci_str,
p_int = p_int
)
}
)
}
## Level order for BMI and Vitamin D
bmi_level_order <- c(
"Underweight (<18.5)",
"Normal (18.5–24.9)",
"Overweight (25–29.9)",
"Obesity I (30–34.9)",
"Obesity II (35–39.9)",
"Obesity III (≥40)"
)
vitD_level_order <- c(
"<20 (deficient)",
"20–29 (insufficient)",
"≥30 (sufficient)"
)
## Helper function to build display table
build_effectmod_display <- function(df) {
df |>
arrange(outcome_lab, modifier_var, level) |>
group_by(outcome_lab, modifier_var, modifier_label) |>
group_split() |>
map_dfr(function(grp) {
out_name <- grp$outcome_lab[[1]]
mod_lab <- grp$modifier_label[[1]]
# Fix modifier text
mod_label_clean <- case_when(
mod_lab == "BMI category" ~ "by BMI category",
mod_lab == "Vitamin D category" ~ "by Vitamin D category",
TRUE ~ paste0("by ", tolower(mod_lab))
)
p_int_vals <- grp$`p for interaction`[!is.na(grp$`p for interaction`)]
p_int_disp <- ifelse(length(p_int_vals) == 0, NA_character_, p_int_vals[1])
header_row <- tibble(
Outcome = out_name,
`Modifier level` = paste0(out_name, " ", mod_label_clean),
`OR (95% CI)` = "",
`p for interaction` = p_int_disp
)
level_rows <- grp |>
transmute(
Outcome = out_name,
`Modifier level` = as.character(level),
`OR (95% CI)` = `OR (95% CI)`,
`p for interaction` = ""
)
bind_rows(header_row, level_rows)
})
}
## Effect modification for primary outcomes
effectmod_primary_long <- map_dfr(
names(outcome_list),
function(out_nm) {
outcome_info <- outcome_list[[out_nm]]
map_dfr(
seq_len(nrow(effectmod_modifiers)),
function(i) {
mod_var <- effectmod_modifiers$modifier_var[i]
mod_label <- effectmod_modifiers$modifier_label[i]
fit_int <- fit_dii_interaction_model(
outcome_info = outcome_info,
covariates_base = covariates_m3,
modifier_var = mod_var
)
run_dii_effectmod(
fit = fit_int,
outcome_label = outcome_info$label,
modifier_var = mod_var,
modifier_label = mod_label
)
}
)
}
) |>
mutate(
`p for interaction` = ifelse(
is.na(p_int),
NA_character_,
scales::pvalue(p_int, accuracy = 0.001)
)
)
effectmod_primary_long <- effectmod_primary_long |>
mutate(
modifier_var = factor(
modifier_var,
levels = c("sex", "bmi_cat", "vitD25oh_cat")
),
level = as.character(level)
) |>
mutate(
level = case_when(
modifier_var == "bmi_cat" ~ factor(level, levels = bmi_level_order),
modifier_var == "vitD25oh_cat" ~ factor(level, levels = vitD_level_order),
TRUE ~ factor(level)
),
## Enforce outcome order for primary outcomes:
## Asthma, Allergic rhinitis, Inflammatory arthritis, Diabetes, Hypertension
outcome_lab = factor(
outcome_lab,
levels = c(
"Asthma",
"Allergic rhinitis",
"Inflammatory arthritis",
"Diabetes",
"Hypertension"
)
)
) |>
arrange(
outcome_lab,
modifier_var,
level
)
effectmod_primary_display <- build_effectmod_display(effectmod_primary_long)
tbl_effectmod_primary <- effectmod_primary_display |>
select(
Outcome,
`Modifier level`,
`OR (95% CI)`,
`p for interaction`
) |>
gt(groupname_col = "Outcome") |>
tab_header(
title = md("**Table 9: Effect Modification of the Association Between Dietary Inflammatory Index and Asthma, Allergic Rhinitis, Inflammatory Arthritis, Diabetes, and Hypertension**")
) |>
fmt_markdown(columns = everything()) |>
cols_label(
`Modifier level` = md("**Modifier level**"),
`OR (95% CI)` = md("**OR (95% CI)**"),
`p for interaction` = md("**p for interaction**")
) |>
tab_options(table.font.size = px(10)) |>
tab_style(
style = cell_text(weight = "bold"),
locations = cells_row_groups()
) |>
tab_footnote(
footnote = "NE = not estimatable due to low event rates or unstable estimates",
locations = cells_body(
columns = "OR (95% CI)",
rows = `OR (95% CI)` == "NE"
)
)
## Effect modification for secondary outcomes
effectmod_secondary_long <- map_dfr(
names(outcome_list_secondary),
function(out_nm) {
outcome_info <- outcome_list_secondary[[out_nm]]
map_dfr(
seq_len(nrow(effectmod_modifiers)),
function(i) {
mod_var <- effectmod_modifiers$modifier_var[i]
mod_label <- effectmod_modifiers$modifier_label[i]
fit_int <- fit_dii_interaction_model(
outcome_info = outcome_info,
covariates_base = covariates_m3,
modifier_var = mod_var
)
run_dii_effectmod(
fit = fit_int,
outcome_label = outcome_info$label,
modifier_var = mod_var,
modifier_label = mod_label
)
}
)
}
) |>
mutate(
outcome_lab = recode(
outcome_lab,
"Atopic dermatitis" = "Eczema"
),
`p for interaction` = ifelse(
is.na(p_int),
NA_character_,
scales::pvalue(p_int, accuracy = 0.001)
)
)
effectmod_secondary_long <- effectmod_secondary_long |>
mutate(
modifier_var = factor(
modifier_var,
levels = c("sex", "bmi_cat", "vitD25oh_cat")
),
level = as.character(level)
) |>
mutate(
level = case_when(
modifier_var == "bmi_cat" ~ factor(level, levels = bmi_level_order),
modifier_var == "vitD25oh_cat" ~ factor(level, levels = vitD_level_order),
TRUE ~ factor(level)
),
## Enforce outcome order for secondary outcomes:
## Eczema, Food allergy
outcome_lab = factor(
outcome_lab,
levels = c(
"Eczema",
"Food allergy"
)
)
) |>
arrange(
outcome_lab,
modifier_var,
level
)
effectmod_secondary_display <- build_effectmod_display(effectmod_secondary_long)
tbl_effectmod_secondary <- effectmod_secondary_display |>
select(
Outcome,
`Modifier level`,
`OR (95% CI)`,
`p for interaction`
) |>
gt(groupname_col = "Outcome") |>
tab_header(
title = md("**Table 10: Effect Modification of the Association Between Dietary Inflammatory Index and Eczema or Food Allergy**")
) |>
fmt_markdown(columns = everything()) |>
cols_label(
`Modifier level` = md("**Modifier level**"),
`OR (95% CI)` = md("**OR (95% CI)**"),
`p for interaction` = md("**p for interaction**")
) |>
tab_options(table.font.size = px(10)) |>
tab_style(
style = cell_text(weight = "bold"),
locations = cells_row_groups()
) |>
tab_footnote(
footnote = "NE = not estimatable due to low event rates or unstable estimates",
locations = cells_body(
columns = "OR (95% CI)",
rows = `OR (95% CI)` == "NE"
)
)
# Print effect modification tables-------------------------------------------------------------
tbl_effectmod_primary| Table 9: Effect Modification of the Association Between Dietary Inflammatory Index and Asthma, Allergic Rhinitis, Inflammatory Arthritis, Diabetes, and Hypertension | ||
|---|---|---|
| Modifier level | OR (95% CI) | p for interaction |
| Asthma | ||
| Asthma by sex | 0.072 | |
| Female | 0.95 (0.91, 1.00) | |
| Male | 1.01 (0.96, 1.06) | |
| Asthma by BMI category | 0.808 | |
| Underweight (<18.5) | 1.04 (0.73, 1.48) | |
| Normal (18.5–24.9) | 0.97 (0.91, 1.03) | |
| Overweight (25–29.9) | 0.99 (0.92, 1.07) | |
| Obesity I (30–34.9) | 0.96 (0.89, 1.04) | |
| Obesity II (35–39.9) | 1.03 (0.93, 1.14) | |
| Obesity III (≥40) | 0.97 (0.87, 1.09) | |
| Asthma by Vitamin D category | 0.842 | |
| <20 (deficient) | 1.00 (0.79, 1.26) | |
| 20–29 (insufficient) | 0.94 (0.81, 1.09) | |
| ≥30 (sufficient) | 0.97 (0.93, 1.02) | |
| Allergic rhinitis | ||
| Allergic rhinitis by sex | 0.337 | |
| Female | 0.95 (0.89, 1.00) | |
| Male | 0.99 (0.93, 1.05) | |
| Allergic rhinitis by BMI category | 0.335 | |
| Underweight (<18.5) | 0.85 (0.62, 1.18) | |
| Normal (18.5–24.9) | 0.96 (0.91, 1.02) | |
| Overweight (25–29.9) | 0.98 (0.91, 1.06) | |
| Obesity I (30–34.9) | 1.00 (0.92, 1.08) | |
| Obesity II (35–39.9) | 1.00 (0.87, 1.16) | |
| Obesity III (≥40) | 0.81 (0.67, 0.97) | |
| Allergic rhinitis by Vitamin D category | 0.861 | |
| <20 (deficient) | 1.02 (0.83, 1.25) | |
| 20–29 (insufficient) | 0.97 (0.79, 1.18) | |
| ≥30 (sufficient) | 0.96 (0.92, 1.01) | |
| Inflammatory arthritis | ||
| Inflammatory arthritis by sex | 0.263 | |
| Female | 1.02 (0.88, 1.18) | |
| Male | 1.15 (0.94, 1.40) | |
| Inflammatory arthritis by BMI category | <0.001 | |
| Underweight (<18.5) | NE1 | |
| Normal (18.5–24.9) | 1.04 (0.89, 1.22) | |
| Overweight (25–29.9) | 1.02 (0.87, 1.18) | |
| Obesity I (30–34.9) | 1.10 (0.78, 1.55) | |
| Obesity II (35–39.9) | 1.10 (0.93, 1.31) | |
| Obesity III (≥40) | 1.12 (0.69, 1.84) | |
| Inflammatory arthritis by Vitamin D category | 0.603 | |
| <20 (deficient) | 0.72 (0.23, 2.25) | |
| 20–29 (insufficient) | 0.93 (0.72, 1.21) | |
| ≥30 (sufficient) | 1.08 (0.92, 1.26) | |
| Diabetes | ||
| Diabetes by sex | 0.641 | |
| Female | 1.04 (0.91, 1.19) | |
| Male | 0.99 (0.85, 1.15) | |
| Diabetes by BMI category | 0.994 | |
| Underweight (<18.5) | 0.98 (0.87, 1.10) | |
| Normal (18.5–24.9) | 1.04 (0.72, 1.49) | |
| Overweight (25–29.9) | 1.01 (0.84, 1.21) | |
| Obesity I (30–34.9) | 1.03 (0.83, 1.27) | |
| Obesity II (35–39.9) | 0.98 (0.82, 1.16) | |
| Obesity III (≥40) | 1.01 (0.87, 1.19) | |
| Diabetes by Vitamin D category | <0.001 | |
| <20 (deficient) | 1.70 (1.13, 2.57) | |
| 20–29 (insufficient) | 0.81 (0.65, 0.99) | |
| ≥30 (sufficient) | 1.05 (0.93, 1.18) | |
| Hypertension | ||
| Hypertension by sex | 0.285 | |
| Female | 1.06 (0.99, 1.15) | |
| Male | 1.01 (0.96, 1.07) | |
| Hypertension by BMI category | 0.135 | |
| Underweight (<18.5) | 1.08 (0.82, 1.42) | |
| Normal (18.5–24.9) | 1.07 (0.98, 1.16) | |
| Overweight (25–29.9) | 1.01 (0.94, 1.09) | |
| Obesity I (30–34.9) | 1.10 (1.01, 1.20) | |
| Obesity II (35–39.9) | 0.97 (0.87, 1.08) | |
| Obesity III (≥40) | 0.98 (0.87, 1.09) | |
| Hypertension by Vitamin D category | 0.047 | |
| <20 (deficient) | 0.82 (0.64, 1.05) | |
| 20–29 (insufficient) | 0.94 (0.82, 1.07) | |
| ≥30 (sufficient) | 1.05 (1.01, 1.10) | |
| 1 NE = not estimatable due to low event rates or unstable estimates | ||
tbl_effectmod_secondary| Table 10: Effect Modification of the Association Between Dietary Inflammatory Index and Eczema or Food Allergy | ||
|---|---|---|
| Modifier level | OR (95% CI) | p for interaction |
| Eczema | ||
| Eczema by sex | 0.341 | |
| Female | 1.02 (0.83, 1.26) | |
| Male | 0.90 (0.76, 1.06) | |
| Eczema by BMI category | 0.012 | |
| Underweight (<18.5) | 2.20 (1.25, 3.88) | |
| Normal (18.5–24.9) | 0.95 (0.76, 1.17) | |
| Overweight (25–29.9) | 1.04 (0.83, 1.30) | |
| Obesity I (30–34.9) | 0.86 (0.69, 1.08) | |
| Obesity II (35–39.9) | 1.24 (0.73, 2.10) | |
| Obesity III (≥40) | 0.86 (0.60, 1.23) | |
| Eczema by Vitamin D category | 0.184 | |
| <20 (deficient) | 0.57 (0.21, 1.54) | |
| 20–29 (insufficient) | 1.22 (0.79, 1.89) | |
| ≥30 (sufficient) | 0.96 (0.85, 1.09) | |
| Food allergy | ||
| Food allergy by sex | 0.322 | |
| Female | 0.93 (0.87, 1.00) | |
| Male | 0.98 (0.89, 1.08) | |
| Food allergy by BMI category | 0.988 | |
| Underweight (<18.5) | 0.75 (0.36, 1.56) | |
| Normal (18.5–24.9) | 0.96 (0.87, 1.07) | |
| Overweight (25–29.9) | 0.95 (0.85, 1.07) | |
| Obesity I (30–34.9) | 0.95 (0.83, 1.09) | |
| Obesity II (35–39.9) | 0.98 (0.81, 1.17) | |
| Obesity III (≥40) | 0.93 (0.78, 1.11) | |
| Food allergy by Vitamin D category | 0.313 | |
| <20 (deficient) | 0.80 (0.64, 1.00) | |
| 20–29 (insufficient) | 0.97 (0.84, 1.14) | |
| ≥30 (sufficient) | 0.95 (0.88, 1.02) | |
Machine learning analyses
Exploratory machine learning models were developed to identify predictors of disease beyond those detected in traditional regression analyses. For each outcome, four algorithms – namely LR, LASSO, RF, and XGB – were trained on one of two predictor sets: (1) core demographic, psychosocial, clinical, and dietary covariates, including the total DII score, and (2) quartiles of individual nutrient intakes. ROC curves (Figures 4-5) and AUROC values (Tables 11–12) summarize model performance for models with core covariates and nutrients only, respectively. Diabetes demonstrated the strongest performance (AUROC 0.83–0.92), followed by hypertension (AUROC 0.72–0.76) and inflammatory arthritis (AUROC 0.64–0.76). Performance was more modest for allergic rhinitis (AUROC 0.62–0.66) and asthma (AUROC 0.61–0.62), and was notably lower for food allergy (AUROC 0.49–0.59) and eczema (AUROC 0.44–0.60). Across all outcomes, models trained using nutrient-intake quartiles had comparatively reduced discrimination (AUROC 0.46–0.65).
Feature importance analyses were performed to acquire insight into potentially important predictors related to disease outcomes. In core covariate models (Figures 6–12), the DII score consistently ranked among the top 10 predictors for every disease outcome. For diabetes and hypertension, important features included markers of hyperglycemia, hyperlipiedemia, and adiposity, while DII scores, cardiometabolic factors, and psychosocial variables were influential in models for allergic outcomes. In contrast, nutrient-only models (Figures 16–19) demonstrated inconsistent patterns across disease outcomes. Nutrients such as PUFA, omega-6 fats, caffeine, total caloric intake, iron, and B vitamins appeared recurrently but without disease-specific consistency.
# Model run options, outcome definitions, and helper functions---------------------------------
options(yardstick.event_first = "second") # second outcome is disease
## Trial mode. TRUE = abbreviated, trial run (3-fold CV, smaller grids) for exploratory analysis; FALSE = full run (10-fold CV, larger grids) for final analysis
trial_mode <- FALSE
trial_max_n <- 2000 # maximum complete cases per outcome for trial runs
tidymodels_prefer()
ml_outcomes <- list(
asthma = list(
label = "Asthma",
data = asthma_reg,
outcome = "asthma",
is_primary = TRUE
),
ar = list(
label = "Allergic rhinitis",
data = ar_reg,
outcome = "ar",
is_primary = TRUE
),
arthritis = list(
label = "Inflammatory arthritis",
data = inflam_reg,
outcome = "arthritis",
is_primary = TRUE
),
htn = list(
label = "Hypertension",
data = htn_reg,
outcome = "htn",
is_primary = TRUE
),
diabetes = list(
label = "Diabetes",
data = dm_reg,
outcome = "diabetes",
is_primary = TRUE
),
ad = list(
label = "Atopic dermatitis",
data = ad_reg,
outcome = "ad",
is_primary = FALSE
),
fa = list(
label = "Food allergy",
data = fa_reg,
outcome = "fa",
is_primary = FALSE
)
)
## Helper function to add trend variables
add_ml_trend_vars <- function(data) {
### Vitamin D trend
if ("vitD25oh_cat" %in% names(data) && !"vitD25oh_cat_trend" %in% names(data)) {
data <- data |>
mutate(
vitD25oh_cat_trend = if_else(
!is.na(vitD25oh_cat),
as.numeric(vitD25oh_cat) - 1,
NA_real_
)
)
}
### DII quartile trend
if ("dii_quart" %in% names(data) && !"dii_quart_trend" %in% names(data)) {
data <- data |>
mutate(
dii_quart_trend = if_else(
!is.na(dii_quart),
as.numeric(dii_quart) - 1,
NA_real_
)
)
}
data
}
# Predictor sets-------------------------------------------------------------------------------
# Set 1 variabvles: core covariates
predictors_set1 <- c(
"age",
"sex",
"race",
"insur",
"educ",
"pir",
"food_sec",
"smoking",
"phys_act",
"alcohol",
"sbp_cat",
"dbp_cat",
"bmi_cat",
"whtr_cat",
"hdl",
"non_hdl",
"tchol",
"hba1c",
"vitD25oh_cat",
"vitD25oh_cat_trend",
"dii",
"dii_quart_trend",
"dii_etoh"
)
# Set 2 variables: individual nutrient quartiles
predictors_set2 <- c(
"kcal_quart", "carb_quart", "protein_quart", "totalfat_quart", "satfat_quart",
"pufa_quart", "mufa_quart", "n3fat_quart", "n6fat_quart", "choles_quart",
"vitA_quart", "bcarotene_quart", "vitC_quart", "vitE_quart",
"vitB6_quart", "vitB12_quart", "folicacid_quart",
"thiamin_quart", "riboflavin_quart", "niacin_quart",
"iron_quart", "mg_quart", "zn_quart", "se_quart",
"fiber_quart", "caffeine_quart"
)
# Helper functions-----------------------------------------------------------------------------
## Helper function to prepare ML data
prepare_ml_data <- function(data, outcome, predictors) {
### Add trend scores
data <- add_ml_trend_vars(data)
### Keep only outcome and predictors
vars_keep <- intersect(c(outcome, predictors), names(data))
df <- data |>
select(all_of(vars_keep)) |>
drop_na() |>
mutate(
!!outcome := factor(
.data[[outcome]],
levels = c(0, 1),
labels = c("No", "Yes")
)
)
### For trial runs: sub-sample if enough events and keep all rows for rare outcomes
if (trial_mode && nrow(df) > trial_max_n) {
case_n <- sum(df[[outcome]] == "Yes")
min_events_for_subsample <- 150L
if (case_n >= min_events_for_subsample) {
df <- slice_sample(df, n = trial_max_n)
}
}
df
}
## Helper function to train/test-split data and perform cross-validation
set.seed(1234)
make_splits_and_folds <- function(df, outcome) {
v_folds <- if (trial_mode) 3L else 10L
split_obj <- initial_split(
df,
strata = !!rlang::sym(outcome),
prop = 0.8
)
train_df <- training(split_obj)
test_df <- testing(split_obj)
folds <- vfold_cv(
train_df,
v = v_folds,
strata = !!rlang::sym(outcome)
)
list(
split = split_obj,
train = train_df,
test = test_df,
folds = folds
)
}
## Helper function for recipes
make_ml_recipe <- function(df, outcome, predictors) {
recipe(
formula = stats::as.formula(paste(outcome, "~", ".")),
data = df
) |>
step_dummy(all_nominal_predictors()) |>
step_zv(all_predictors()) |>
step_normalize(all_predictors())
}
# Model specifications-------------------------------------------------------------------------
lr_spec <-
logistic_reg() |>
set_engine("glm") |>
set_mode("classification")
lasso_spec <-
logistic_reg(
penalty = tune(),
mixture = 1
) |>
set_engine("glmnet") |>
set_mode("classification")
rf_spec_base <-
rand_forest(
trees = if (trial_mode) 500L else 1000L,
mtry = tune(),
min_n = 5L
) |>
set_engine("randomForest", importance = TRUE) |>
set_mode("classification")
xgb_spec_base <-
boost_tree(
trees = tune(),
tree_depth = tune(),
learn_rate = tune(),
min_n = 5L,
sample_size = 1.0,
loss_reduction = 0
) |>
set_mode("classification") |>
set_engine("xgboost")
## Helper function for logistic regression metrics and variable importance
get_lr_vi <- function(lr_fit_workflow) {
glm_fit <- extract_fit_engine(lr_fit_workflow)
broom::tidy(glm_fit) |>
filter(term != "(Intercept)") |>
transmute(
Variable = term,
Importance = abs(estimate),
model = "Logistic regression"
)
}
# Machine learning pipeline -------------------------------------------------------------------
run_ml_pipeline <- function(predictors, predictor_set_label) {
## Datasets and sample sizes
ml_data_list <- purrr::imap(
ml_outcomes,
~ prepare_ml_data(
data = .x$data,
outcome = .x$outcome,
predictors = predictors
)
)
ml_sample_sizes <- purrr::imap_dfr(
ml_data_list,
~ tibble(
outcome = .y,
outcome_label = ml_outcomes[[.y]]$label,
N_ml = nrow(.x),
predictor_set = predictor_set_label
)
)
ml_sample_sizes_gt <- ml_sample_sizes |>
gt() |>
tab_header(
title = paste0("ML analytic sample sizes by outcome — ", predictor_set_label)
) |>
cols_label(
outcome = "Outcome (internal name)",
outcome_label = "Outcome",
N_ml = "Complete-case N (ML)",
predictor_set = "Predictor set"
)
## Train/test splits and cross-validation folds
ml_resampling <- purrr::imap(
ml_data_list,
~ make_splits_and_folds(.x, outcome = ml_outcomes[[.y]]$outcome)
)
## Recipes and feature counts
ml_recipes <- purrr::imap(
ml_resampling,
~ make_ml_recipe(
df = .x$train,
outcome = ml_outcomes[[.y]]$outcome,
predictors = predictors
)
)
ml_feature_counts <- purrr::imap_dfr(
ml_recipes,
~ {
prep_rec <- prep(.x)
baked <- juice(prep_rec)
tibble(
outcome = .y,
outcome_label = ml_outcomes[[.y]]$label,
n_features = ncol(baked) - 1,
predictor_set = predictor_set_label
)
}
)
ml_feature_counts_gt <- ml_feature_counts |>
gt() |>
tab_header(
title = paste0("Number of ML predictors after preprocessing — ", predictor_set_label)
) |>
cols_label(
outcome = "Outcome (internal name)",
outcome_label = "Outcome",
n_features = "Number of predictors",
predictor_set = "Predictor set"
)
## Tuning grids
lasso_grid <- grid_regular(
penalty(),
levels = if (trial_mode) 5L else 30L
)
rf_grid <- grid_regular(
mtry(range = c(5L, min(30L, length(predictors)))),
levels = if (trial_mode) 3L else 5L
)
set.seed(1234)
xgb_grid <- grid_latin_hypercube(
trees(range = c(100L, 300L)),
tree_depth(range = c(3L, 5L)),
learn_rate(range = c(-2, -1)),
size = if (trial_mode) 10L else 20L
)
## Workflows per outcome
make_workflows_for_outcome <- function(recipe_obj, outcome_name, train_df) {
list(
lr = workflow() |>
add_model(lr_spec) |>
add_recipe(recipe_obj),
lasso = workflow() |>
add_model(lasso_spec) |>
add_recipe(recipe_obj),
rf = workflow() |>
add_model(rf_spec_base) |>
add_recipe(recipe_obj),
xgb = workflow() |>
add_model(xgb_spec_base) |>
add_recipe(recipe_obj)
)
}
ml_workflows <- purrr::imap(
ml_resampling,
~ make_workflows_for_outcome(
recipe_obj = ml_recipes[[.y]],
outcome_name = .y,
train_df = .x$train
)
)
ctrl_resamples <- control_grid(
save_pred = TRUE,
parallel_over = "resamples",
verbose = TRUE
)
## Model tuning and cross-validation for each outcome
tune_models_for_outcome <- function(outcome_name) {
resampling <- ml_resampling[[outcome_name]]
workflows_out <- ml_workflows[[outcome_name]]
lr_res <- workflows_out$lr |>
fit_resamples(
resamples = resampling$folds,
control = control_resamples(save_pred = TRUE),
metrics = metric_set(
roc_auc
)
)
lasso_res <- workflows_out$lasso |>
tune_grid(
resamples = resampling$folds,
grid = lasso_grid,
control = ctrl_resamples,
metrics = metric_set(roc_auc)
)
rf_res <- workflows_out$rf |>
tune_grid(
resamples = resampling$folds,
grid = rf_grid,
control = ctrl_resamples,
metrics = metric_set(roc_auc)
)
xgb_res <- workflows_out$xgb |>
tune_grid(
resamples = resampling$folds,
grid = xgb_grid,
control = ctrl_resamples,
metrics = metric_set(roc_auc)
)
list(
lr = lr_res,
lasso = lasso_res,
rf = rf_res,
xgb = xgb_res
)
}
set.seed(1234)
ml_tuning_results <- purrr::imap(
ml_outcomes,
~ tune_models_for_outcome(.y)
)
## Best hyperparameters
get_best_params <- function(tune_res) {
list(
lasso = select_best(tune_res$lasso, metric = "roc_auc"),
rf = select_best(tune_res$rf, metric = "roc_auc"),
xgb = select_best(tune_res$xgb, metric = "roc_auc")
)
}
ml_best_params <- purrr::imap(
ml_tuning_results,
~ get_best_params(.x)
)
## Final model fits and test-set performance
fit_and_evaluate_final_models <- function(outcome_name) {
outcome_info <- ml_outcomes[[outcome_name]]
resampling <- ml_resampling[[outcome_name]]
workflows_out <- ml_workflows[[outcome_name]]
best_pars <- ml_best_params[[outcome_name]]
outcome_var <- outcome_info$outcome
train_df <- resampling$train
test_df <- resampling$test
### Logistic regression
final_lr_fit <- workflows_out$lr |>
fit(data = train_df)
lr_preds_test <- bind_cols(
truth = test_df[[outcome_var]],
predict(final_lr_fit, test_df),
predict(final_lr_fit, test_df, type = "prob")
)
lr_metrics <- roc_auc(
lr_preds_test,
truth,
.pred_Yes,
event_level = "second"
) |>
mutate(
outcome = outcome_info$label,
model = "Logistic regression",
set = "Full predictors",
outcome_internal = outcome_name,
predictor_set = predictor_set_label
)
### LASSO
final_lasso_wf <- workflows_out$lasso |>
finalize_workflow(best_pars$lasso)
final_lasso_fit <- final_lasso_wf |>
fit(data = train_df)
lasso_preds_test <- bind_cols(
truth = test_df[[outcome_var]],
predict(final_lasso_fit, test_df),
predict(final_lasso_fit, test_df, type = "prob")
)
lasso_metrics <- roc_auc(
lasso_preds_test,
truth,
.pred_Yes,
event_level = "second"
) |>
mutate(
outcome = outcome_info$label,
model = "LASSO",
set = "Full predictors",
outcome_internal = outcome_name,
predictor_set = predictor_set_label
)
### Random forest
final_rf_wf <- workflows_out$rf |>
finalize_workflow(best_pars$rf)
final_rf_fit <- final_rf_wf |>
fit(data = train_df)
rf_preds_test <- bind_cols(
truth = test_df[[outcome_var]],
predict(final_rf_fit, test_df),
predict(final_rf_fit, test_df, type = "prob")
)
rf_metrics <- roc_auc(
rf_preds_test,
truth,
.pred_Yes,
event_level = "second"
) |>
mutate(
outcome = outcome_info$label,
model = "Random forest",
set = "Full predictors",
outcome_internal = outcome_name,
predictor_set = predictor_set_label
)
### XGBoost
final_xgb_wf <- workflows_out$xgb |>
finalize_workflow(best_pars$xgb)
final_xgb_fit <- final_xgb_wf |>
fit(data = train_df)
xgb_preds_test <- bind_cols(
truth = test_df[[outcome_var]],
predict(final_xgb_fit, test_df),
predict(final_xgb_fit, test_df, type = "prob")
)
xgb_metrics <- roc_auc(
xgb_preds_test,
truth,
.pred_Yes,
event_level = "second"
) |>
mutate(
outcome = outcome_info$label,
model = "XGBoost",
set = "Full predictors",
outcome_internal = outcome_name,
predictor_set = predictor_set_label
)
list(
metrics = bind_rows(
lr_metrics,
lasso_metrics,
rf_metrics,
xgb_metrics
),
preds = list(
lr = lr_preds_test,
lasso = lasso_preds_test,
rf = rf_preds_test,
xgb = xgb_preds_test
),
final_fits = list(
lr = final_lr_fit,
lasso = final_lasso_fit,
rf = final_rf_fit,
xgb = final_xgb_fit
)
)
}
ml_final_results <- purrr::imap(
ml_outcomes,
~ fit_and_evaluate_final_models(.y)
)
## Performance summary data
ml_performance_summary_long <- purrr::map_dfr(
names(ml_final_results),
~ ml_final_results[[.x]]$metrics
)
ml_performance_table <- ml_performance_summary_long |>
select(
predictor_set,
outcome_internal,
outcome,
model,
set,
.metric,
.estimate
) |>
pivot_wider(
id_cols = c(predictor_set, outcome_internal, outcome, model, set),
names_from = .metric,
values_from = .estimate
) |>
rename(
AUROC = roc_auc
) |>
arrange(outcome, desc(AUROC))
ml_performance_table_gt <- ml_performance_table |>
gt(rowname_col = "model", groupname_col = "outcome") |>
tab_header(
title = paste0("Test-set performance of ML models by outcome — ",
predictor_set_label)
) |>
fmt_number(
columns = c(AUROC),
decimals = 3
) |>
cols_label(
predictor_set = "Predictor set",
outcome_internal = "Outcome (internal)",
outcome = "Outcome",
set = "Predictor usage",
AUROC = "AUROC"
)
## Variable importance data
get_vi_all_models_for_outcome <- function(outcome_name) {
fits <- ml_final_results[[outcome_name]]$final_fits
lr_vi <- get_lr_vi(fits$lr)
lasso_vi <- vip::vi(fits$lasso) |>
mutate(
model = "LASSO",
Variable = as.character(Variable)
)
rf_vi <- vip::vi(fits$rf) |>
mutate(
model = "Random forest",
Variable = as.character(Variable)
)
xgb_vi <- vip::vi(fits$xgb) |>
mutate(
model = "XGBoost",
Variable = as.character(Variable)
)
bind_rows(lr_vi, lasso_vi, rf_vi, xgb_vi)
}
ml_vi_all <- purrr::imap(
ml_outcomes,
~ get_vi_all_models_for_outcome(.y)
)
## Pipeline outputs
list(
predictor_set_label = predictor_set_label,
predictors = predictors,
ml_sample_sizes = ml_sample_sizes,
ml_sample_sizes_gt = ml_sample_sizes_gt,
ml_feature_counts = ml_feature_counts,
ml_feature_counts_gt = ml_feature_counts_gt,
ml_data_list = ml_data_list,
ml_resampling = ml_resampling,
ml_recipes = ml_recipes,
ml_tuning_results = ml_tuning_results,
ml_best_params = ml_best_params,
ml_final_results = ml_final_results,
ml_performance_table = ml_performance_table,
ml_performance_table_gt = ml_performance_table_gt,
ml_vi_all = ml_vi_all
)
}
# Run machine learning pipelines for set 1 (core covariates) and set 2 (individual nutritional covariates)------------------------------------------------------------------------------------
results_set1 <- run_ml_pipeline(
predictors = predictors_set1,
predictor_set_label = "Set 1: main regression predictors"
)
results_set2 <- run_ml_pipeline(
predictors = predictors_set2,
predictor_set_label = "Set 2: nutrient quartiles"
)
ml_performance_all_stage1 <- bind_rows(
results_set1$ml_performance_table,
results_set2$ml_performance_table
)
# ROC curves-----------------------------------------------------------------------------------
## ROC curves for all outcomes by predictors
make_roc_panel <- function(results_obj, title_text) {
desired_order <- c(
"Asthma",
"Allergic rhinitis",
"Inflammatory\narthritis",
"Diabetes",
"Hypertension",
"Eczema",
"Food allergy"
)
outcome_label_map <- c(
asthma = "Asthma",
ar = "Allergic rhinitis",
arthritis = "Inflammatory\narthritis",
diabetes = "Diabetes",
htn = "Hypertension",
ad = "Eczema",
fa = "Food allergy"
)
### Model color mapping
custom_colors <- c(
"LR" = "#E69F00", # logistic regression
"Lasso" = "#009E73", # LASSO
"RF" = "#0072B2", # random forest
"XGB" = "#CC79A7" # XGBoost
)
outcome_names <- names(ml_outcomes)
roc_long <- purrr::map_dfr(
outcome_names,
function(out_nm) {
preds <- results_obj$ml_final_results[[out_nm]]$preds
outcome_label <- outcome_label_map[[out_nm]]
bind_rows(
roc_curve(preds$lr, truth, .pred_Yes, event_level = "second") |>
mutate(model = "LR"),
roc_curve(preds$lasso, truth, .pred_Yes, event_level = "second") |>
mutate(model = "Lasso"),
roc_curve(preds$rf, truth, .pred_Yes, event_level = "second") |>
mutate(model = "RF"),
roc_curve(preds$xgb, truth, .pred_Yes, event_level = "second") |>
mutate(model = "XGB")
) |>
mutate(
outcome = factor(outcome_label, levels = desired_order)
)
}
)
ggplot(
roc_long,
aes(x = 1 - specificity, y = sensitivity, color = model)
) +
geom_line(linewidth = 0.9) +
geom_abline(linetype = "dashed") +
coord_equal() +
facet_wrap(~ outcome, ncol = 5, drop = FALSE) +
labs(
title = title_text,
x = "1 − Specificity",
y = "Sensitivity",
color = "Model"
) +
scale_color_manual(
values = custom_colors,
breaks = c("LR", "Lasso", "RF", "XGB")
) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
axis.title.x = element_text(size = 12, face = "bold"),
axis.title.y = element_text(size = 12, face = "bold"),
legend.position = "bottom",
legend.title = element_text(size = 12, face = "bold"),
legend.text = element_text(size = 12),
strip.text = element_text(size = 12, face = "bold"),
panel.spacing.x = grid::unit(1.2, "lines"),
panel.spacing.y = grid::unit(1.2, "lines")
)
}
## ROC curves for models with core covariates
roc_panel_set1 <- make_roc_panel(
results_set1,
"Figure 4: Receiver Operating Characteristic Curves\nfor Machine Learning Models With Core Covariates"
)
## ROC curves for models with individual nutritional covariates
roc_panel_set2 <- make_roc_panel(
results_set2,
"Figure 5: Receiver Operating Characteristic Curves\nfor Machine Learning Models With Individual Nutritional Covariates"
)
# Machine learning model performance summary tables -------------------------------------------
## Desired outcome order
desired_outcome_order <- c(
"Asthma",
"Allergic rhinitis",
"Inflammatory arthritis",
"Diabetes",
"Hypertension",
"Eczema",
"Food allergy"
)
## Core covariates performance summary (set 1)
core_perf <- results_set1$ml_performance_table |>
mutate(
outcome = recode(outcome, "Atopic dermatitis" = "Eczema"),
outcome = factor(outcome, levels = desired_outcome_order),
model = factor(
model,
levels = c(
"Logistic regression",
"LASSO",
"Random forest",
"XGBoost"
)
)
) |>
arrange(outcome, model)
core_perf_gt <- core_perf |>
select(
outcome,
model,
AUROC
) |>
gt(rowname_col = "model", groupname_col = "outcome") |>
tab_header(
title = md("**Table 11: Machine Learning Model Performance Summary for Models With Core Covariates**")
) |>
fmt_number(
columns = c(AUROC),
decimals = 3
) |>
cols_label(
outcome = md("**Disease**"),
model = md("**Model**"),
AUROC = md("**AUROC**")
) |>
tab_options(
table.width = pct(70)
) |>
tab_style(
style = list(
cell_text(weight = "bold")
),
locations = list(
cells_title("title"),
cells_column_labels(everything()),
cells_row_groups()
)
)
## Individual nutritional covariates performance summary (set 2)
nutr_perf <- results_set2$ml_performance_table |>
mutate(
outcome = recode(outcome, "Atopic dermatitis" = "Eczema"),
outcome = factor(outcome, levels = desired_outcome_order),
model = factor(
model,
levels = c(
"Logistic regression",
"LASSO",
"Random forest",
"XGBoost"
)
)
) |>
arrange(outcome, model)
nutr_perf_gt <- nutr_perf |>
select(
outcome,
model,
AUROC
) |>
gt(rowname_col = "model", groupname_col = "outcome") |>
tab_header(
title = md("**Table 12: Machine Learning Model Performance Summary for Models With Individual Nutritional Covariates**")
) |>
fmt_number(
columns = c(AUROC),
decimals = 3
) |>
cols_label(
outcome = md("**Disease**"),
model = md("**Model**"),
AUROC = md("**AUROC**")
) |>
tab_options(
table.width = pct(70)
) |>
tab_style(
style = list(
cell_text(weight = "bold")
),
locations = list(
cells_title("title"),
cells_column_labels(everything()),
cells_row_groups()
)
)
# Settings for ranked variable importance figures----------------------------------------------
## Disease-specific color map
disease_color_map <- c(
"Asthma" = "#5E3C99",
"Allergic Rhinitis" = "#89CFF0",
"Inflammatory Arthritis" = "#009E73",
"Diabetes" = "#F0E442",
"Hypertension" = "#D55E00",
"Eczema" = "#E78AC3",
"Food Allergy" = "#4B2E0F"
)
make_vi_plot_single <- function(results_obj,
outcome_name,
disease_title,
predictor_set_desc,
figure_label,
top_n = 10) {
vi_df <- results_obj$ml_vi_all[[outcome_name]]
vi_df <- vi_df |>
filter(!is.na(Importance), Importance != 0)
vi_df <- vi_df |>
mutate(
model = factor(
model,
levels = c(
"Logistic regression",
"LASSO",
"Random forest",
"XGBoost"
)
)
)
vi_plot_df <- vi_df |>
group_by(model) |>
slice_max(order_by = Importance, n = top_n, with_ties = FALSE) |>
arrange(Importance, .by_group = TRUE) |>
mutate(
Variable_plot = paste(model, Variable, sep = "__"),
Variable_plot = factor(Variable_plot, levels = unique(Variable_plot))
) |>
ungroup()
ggplot(
vi_plot_df,
aes(x = Importance, y = Variable_plot)
) +
geom_col(fill = disease_color_map[disease_title]) +
facet_wrap(~ model, scales = "free") +
scale_y_discrete(
labels = function(x) sub("^[^_]+__", "", x)
) +
labs(
title = paste0(
figure_label,
": Ranked Feature Importance of ",
predictor_set_desc,
"\nin Machine Learning Models for ",
disease_title
),
x = "Importance",
y = "Predictor"
) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
axis.title.x = element_text(size = 12, face = "bold"),
axis.title.y = element_text(size = 12, face = "bold"),
strip.text = element_text(size = 11, face = "bold"),
panel.spacing = grid::unit(1, "lines")
)
}
disease_title_map <- c(
asthma = "Asthma",
ar = "Allergic Rhinitis",
arthritis = "Inflammatory Arthritis",
diabetes = "Diabetes",
htn = "Hypertension",
ad = "Eczema",
fa = "Food Allergy"
)
core_figure_labels <- c(
asthma = "Figure 6",
ar = "Figure 7",
arthritis = "Figure 8",
diabetes = "Figure 9",
htn = "Figure 10",
ad = "Figure 11",
fa = "Figure 12"
)
nutr_figure_labels <- c(
asthma = "Figure 13",
ar = "Figure 14",
arthritis = "Figure 15",
diabetes = "Figure 16",
htn = "Figure 17",
ad = "Figure 18",
fa = "Figure 19"
)
# Generate core covariate (set 1) variable importance plots------------------------------------
vi_core_plots <- purrr::imap(
disease_title_map,
~ make_vi_plot_single(
results_obj = results_set1,
outcome_name = .y,
disease_title = .x,
predictor_set_desc = "Core Covariates",
figure_label = core_figure_labels[[.y]]
)
)
vi_core_asthma <- vi_core_plots$asthma # Figure 6
vi_core_ar <- vi_core_plots$ar # Figure 7
vi_core_arthritis <- vi_core_plots$arthritis # Figure 8
vi_core_diabetes <- vi_core_plots$diabetes # Figure 9
vi_core_htn <- vi_core_plots$htn # Figure 10
vi_core_eczema <- vi_core_plots$ad # Figure 11
vi_core_food_allergy <- vi_core_plots$fa # Figure 12
# Generate individual nutritional covariate (set 2) variable importance plots -----------------
vi_nutr_plots <- purrr::imap(
disease_title_map,
~ make_vi_plot_single(
results_obj = results_set2,
outcome_name = .y,
disease_title = .x,
predictor_set_desc = "Individual Nutritional Covariates",
figure_label = nutr_figure_labels[[.y]]
)
)
vi_nutr_asthma <- vi_nutr_plots$asthma # Figure 13
vi_nutr_ar <- vi_nutr_plots$ar # Figure 14
vi_nutr_arthritis <- vi_nutr_plots$arthritis # Figure 15
vi_nutr_diabetes <- vi_nutr_plots$diabetes # Figure 16
vi_nutr_htn <- vi_nutr_plots$htn # Figure 17
vi_nutr_eczema <- vi_nutr_plots$ad # Figure 18
vi_nutr_food_allergy <- vi_nutr_plots$fa # Figure 19
# Print ML tables and figures------------------------------------------------------------------
## Core covariates performance summary
core_perf_gt| Table 11: Machine Learning Model Performance Summary for Models With Core Covariates | |
|---|---|
| AUROC | |
| Asthma | |
| Logistic regression | 0.612 |
| LASSO | 0.607 |
| Random forest | 0.608 |
| XGBoost | 0.621 |
| Allergic rhinitis | |
| Logistic regression | 0.658 |
| LASSO | 0.654 |
| Random forest | 0.618 |
| XGBoost | 0.659 |
| Inflammatory arthritis | |
| Logistic regression | 0.709 |
| LASSO | 0.637 |
| Random forest | 0.762 |
| XGBoost | 0.704 |
| Diabetes | |
| Logistic regression | 0.827 |
| LASSO | 0.921 |
| Random forest | 0.831 |
| XGBoost | 0.894 |
| Hypertension | |
| Logistic regression | 0.760 |
| LASSO | 0.764 |
| Random forest | 0.720 |
| XGBoost | 0.732 |
| Eczema | |
| Logistic regression | 0.439 |
| LASSO | 0.449 |
| Random forest | 0.601 |
| XGBoost | 0.533 |
| Food allergy | |
| Logistic regression | 0.575 |
| LASSO | 0.589 |
| Random forest | 0.486 |
| XGBoost | 0.495 |
## Individual nutritional covariates performance summary
nutr_perf_gt| Table 12: Machine Learning Model Performance Summary for Models With Individual Nutritional Covariates | |
|---|---|
| AUROC | |
| Asthma | |
| Logistic regression | 0.527 |
| LASSO | 0.517 |
| Random forest | 0.462 |
| XGBoost | 0.499 |
| Allergic rhinitis | |
| Logistic regression | 0.540 |
| LASSO | 0.550 |
| Random forest | 0.517 |
| XGBoost | 0.542 |
| Inflammatory arthritis | |
| Logistic regression | 0.652 |
| LASSO | 0.516 |
| Random forest | 0.463 |
| XGBoost | 0.508 |
| Diabetes | |
| Logistic regression | 0.594 |
| LASSO | 0.594 |
| Random forest | 0.599 |
| XGBoost | 0.515 |
| Hypertension | |
| Logistic regression | 0.552 |
| LASSO | 0.551 |
| Random forest | 0.536 |
| XGBoost | 0.531 |
| Eczema | |
| Logistic regression | 0.475 |
| LASSO | 0.500 |
| Random forest | 0.521 |
| XGBoost | 0.544 |
| Food allergy | |
| Logistic regression | 0.510 |
| LASSO | 0.509 |
| Random forest | 0.543 |
| XGBoost | 0.520 |
## ROC panels
print(roc_panel_set1)print(roc_panel_set2)## Core covariates variable importance
print(vi_core_asthma)print(vi_core_ar)print(vi_core_arthritis)print(vi_core_diabetes)print(vi_core_htn)print(vi_core_eczema)print(vi_core_food_allergy)## Individual nutritional covariates variable importance
print(vi_nutr_asthma)print(vi_nutr_ar)print(vi_nutr_arthritis)print(vi_nutr_diabetes)print(vi_nutr_htn)print(vi_nutr_eczema)print(vi_nutr_food_allergy)5 Conclusion
In this nationally representative cohort of U.S. adults aged 20–40 years, I evaluated whether dietary inflammatory potential, measured via the Dietary Inflammatory Index, was associated with allergic, immunologic, and cardiometabolic outcomes. This was performed using both survey-weighted regression and exploratory machine learning analyses. In regression analyses, signals for higher allergic, immunologic, and/or cardiometabolic disease burden and pro-inflammatory diet were detected, but these associations largely attenuated after adjustment for sociodemographic, behavioral, and anthropometric factors. Nutrient-specific analyses identified a few potentially protective micro- and macronutrients, though most associations were modest and not consistently significant. Interestingly, in machine learning models, DII consistently ranked among the top predictors in models that included core covariates, including sociodemographic and clinical variables, although machine learning model findings should be interpreted with caution given they were not robust in their predictive performance. Overall while DII may not independently predict diseases, it may play an incremental role in driving these diseases.
This study has several strengths, including use of nationally representative NHANES data, implementaiton of survey weighting, inclusion of a validated dietary inflammatory score and nutrient composition database, and the use of complementary analytic frameworks to capture both adjusted associations and potentially important predictors in the context of machine learning analyses. However, several important limitations must be acknowledged. First, the study’s cross-sectional design precludes causal inference. Second, the accuracy of self-reported medical diagnoses and 24-hour dietary recalls may be inaccurate. Moreover, self-reported eczema and food allergy were only available in restricted NHANES cycles, a reflection of survey cycle heterogeneity and shifting survey priorities. As analyses relied on complete cases, the potential impact of missing data on the results is unclear. For the machine learning models, several outcomes yielded AUROC values near 0.5, indicating limited predictive performance ability and, in turn, limiting the interpretation of the importance of topmost predictors. In addition, the cross-sectional nature of the study limited the ability of these models to capture temporal patterns.
Future directions for this work include examining broader age ranges and a wider set of candidate diseases to better characterize how dietary inflammatory potential relates to risk across the lifespan. Longitudinal cohort studies, in which dietary patterns and clinical outcomes are tracked over time, would offer a complementary approach for addressing these questions and could be paired with multi-omic data, such as immune and metabolic biomarkers and microbiome profiles. In addition, future machine learning models could focus on enhancing model performance by identifying a more predictive set of features not considered in this analysis. Continued study and refinement of disease risk models that integrate dietary information may reveal insights to inform prevention and management strategies for chronic immune-mediated and metabolic diseases.
6 References
[1] Annesi-Maesano I, Fleddermann M, Hornef M, et al. Allergic diseases in infancy: Epidemiology and current interpretation. World Allergy Organ J. 2021;14(11):100591.
[2] Boddu SK, Giannini C, Marcovecchio ML. Metabolic disorders in young people around the world. Diabetologia.2025;68(11):2374–2385.
[3] Kim H, Sitarik AR, Woodcroft K, Johnson CC, Zoratti E. Birth mode, breastfeeding, pet exposure, and antibiotics: Associations with the gut microbiome and sensitization. Curr Allergy Asthma Rep. 2019;19(4):22.
[4] Sbihi H, Boutin RC, Cutler C, Suen M, Finlay BB, Turvey SE. How early-life environmental exposures shape the gut microbiome and influence allergic disease. Allergy. 2019;74(11):2103–2115.
[5] Kuniyoshi Y, Tsujimoto Y, Banno M, et al. Association of obesity or metabolic syndrome with allergic diseases: Overview of reviews. Obes Rev. 2025;26(3):e13862.
[6] Chang CL, Ali GB, Pham J, et al. Childhood BMI trajectories and asthma and allergies: A systematic review. Clin Exp Allergy. 2023;53(9):911–929.
[7] Centers for Disease Control and Prevention (CDC), National Center for Health Statistics (NCHS). National Health and Nutrition Examination Survey Data, 2005–2012. Hyattsville, MD: US Department of Health and Human Services. Available from: https://wwwn.cdc.gov/nchs/nhanes/
[8] Xu Z. Improving the dietaryindex R package: A proposal to include additional components for more accurate DII computation in NHANES. Am J Clin Nutr. 2025 Jan;121(1):174–175. Epub 2024 Nov 9.
[9] Zhan JJ, Hodge RA, Dunlop AL, Lee MM, Bui L, Liang D, Ferranti EP. Dietaryindex: a user-friendly and versatile R package for standardizing dietary pattern analysis in epidemiological and clinical studies. Am J Clin Nutr. 2024 Nov;120(5):1165–1174. Epub 2024 Aug 23.
[10] Qing L, Zhu Y, Yu C, Zhang Y, Ni J. Exploring the association between dietary Inflammatory Index and chronic pain in US adults using NHANES 1999-2004. Sci Rep. 2024 Apr 16;14(1):8726.
7 Acknowledgements
I would like to thank Drs. Robert Grundmeier and Rich Tsui for their guidance on the analytic and methodological approaches used in this study. I am also grateful to the BMIN 5030 professors and teaching assistants for helping me expand my foundation in data analysis throughout the course. In addition to consulting with my mentors, several references and tools were utilized to aid in completing this project, including prior course lecture slides, practicum exercises, and online troubleshooting resources including Stack Overflow, Posit Forum, and ChatGPT (OpenAI). ChatGPT was used for selective assistance with code debugging and for designing template helper functions to automate repetitive elements of data processing. All analytic decisions, data processing, modeling choices, and final code were conceptualized, implemented, and reviewed by me with input from my co-mentors. I also wish to acknowledge the dietaryindexNDP package, which enabled calculation of DII scores that were key to this analysis.